Towards an
Operational Crisis in HPC System Software: The File System Example
Frank Baetke
EOFS, European Open File System Organization,
Germany
The talk will address key
observations made at two EOFS workshops in 2022 and 2024, an EOFS panel at
ISC 2023 and focus sessions at the EuroHPC Summit
2024. Operational as well as educational aspects will be discussed.
Communication between
users of large HPC centers and the IT staff
responsible for system and storage/file-system management is becoming an
increasing problem as most users are unaware of or uninterested in the
operational aspects of large application programs and the associated
challenges of multiple storage hierarchies, demand scheduling, etc.
This kind of disconnect
causes more and more problems in terms of load balancing, resource efficiency
and system responsiveness and leads to frustration on both sides.
On the education side,
operating systems, storage, I/O and file systems are no longer considered
interesting and important topics in computer science/information tright'>
Abstracts
Bridging the Data Gaps to Democratize AI in
Science, Education and Society
Ilkay Altintas
San Diego Supercomputer Center
and Workflows for Data Science (WorDS) Center of Excellence and WIFIRE Lab University of
California at San Diego, CA USA
The democratization of
Artificial Intelligence (AI) necessitates an ecosystem where data and
research infrastructure are seamlessly integrated and universally accessible.
This talk overviews the imperative of bridging the gaps between these
components through robust services, facilitating an inclusive AI landscape
that empowers diverse research communities and domains. The National Data
Platform (NDP) aims to lower the barriers to entry for AI research and
applications through an integrated services approach to streamline AI
workflows, from data acquisition to model deployment. This approach
underscores the importance of open, extensible, and equitable systems in
driving forward the capabilities of AI, ultimately contributing to the
resolution of grand scientific and societal challenges. Through examining
real case studies leveraging open data platforms and scalable research
infrastructure, the talk will highlight the role of composable systems and
services in NDP to catalyze a platform to empower
users from all backgrounds to engage in meaningful research, learning, and
discovery.
Back to
Session II
|
Towards an
Operational Crisis in HPC System Software: The File System Example
Frank Baetke
EOFS, European Open File System Organization,
Germany
The talk will address key
observations made at two EOFS workshops in 2022 and 2024, an EOFS panel at
ISC 2023 and focus sessions at the EuroHPC Summit
2024. Operational as well as educational aspects will be discussed.
Communication between
users of large HPC centers and the IT staff
responsible for system and storage/file-system management is becoming an
increasing problem as most users are unaware of or uninterested in the
operational aspects of large application programs and the associated
challenges of multiple storage hierarchies, demand scheduling, etc.
This kind of disconnect
causes more and more problems in terms of load balancing, resource efficiency
and system responsiveness and leads to frustration on both sides.
On the education side,
operating systems, storage, I/O and file systems are no longer considered
interesting and important topics in computer science/information technology
curricula. Lectures on operating systems, file systems, etc. have been
abandoned in favor of AI, web services and other
areas considered hot. Today, it is possible to earn a university degree in
computer science without ever having attended lectures on operating systems
and related middleware.
Back to Session III
|
What lies beyond the edge?
Pete
Beckman
Argonne National Laboratory, Argonne, IL, USA
AI is on the move —
bigger, smarter, and richer. Larger
and sophisticated models are being pushed to the edge. Smart infrastructure, smart sensors, and intelligent
scientific instruments are being deployed around the world in a new kind of
AI-enabled computing continuum. The
Sage (sagecontinuum.org) infrastructure allows scientists to deploy AI
algorithms to the edge (AI@Edge), to analyze and
autonomously respond to the highest resolution of data. The infrastructure
allows computer scientists to explore AI algorithms such as federated
learning, self supervised learning as well as
bi-directional interactions between instruments and computation. But what’s next? What lies beyond the edge?
Back to Session IV
|
Entering A New Frontier of AI Networking
Innovation
Gil
Bloch
NVIDIA, Santa
Clara, CA, USA
NVIDIA Quantum InfiniBand
and Spectrum-X Ethernet have emerged as the de facto network standards for
training and deploying AI at scale. InfiniBand’s in-network computing,
ultra-low latency, and high bandwidth capabilities have facilitated creating
larger and more complex foundational models. Spectrum-X is the first Ethernet
platform capable of supporting AI infrastructure, delivering networking
optimized for generative AI to hyperscale AI clouds and enterprises. We’ll
dive deep into the architectural aspects of NVIDIA Quantum-X800 InfiniBand
and Spectrum-X800 Ethernet platforms and their essential roles in
next-generation AI data center designs.
Back to Session III
|
Unlocking the Power
of AI: Leveraging Dense Linear Algebra and Large Language Models on Groq’s LPU
Ernesto Bonomi
GROQ, Mountain View, CA, USA
This presentation explores
the intersection of dense linear algebra and Large Language Models (LLMs), highlighting
Groq’s innovative Language Processing Units (LPU)
as the foundation for a new generation of AI. By examining the LPU’s static
and deterministic dataflow computing paradigm, we will illustrate the
advantages of its SIMD architecture and scalability, demonstrating how these
enable LLMs to respond at unprecedented speed, with real-time processing
capabilities. We will also delve into the distinction between training and
inference, considering objectives, complexity, and costs. Finally, a live
demo will showcase the remarkable performance of LPU, highlighting the
transformative potential of this technology.
Back to Session V
|
Future of HPC: Integrating quantum with
massively parallel computing
Antonio D. Corcoles
IBM Quantum, T.J. Watson Research Center, Yorktown Heights, NY, USA
As quantum computing systems
continue to scale in size and quality, and error resilience approaches start
to enable interesting computational regimes in what we call the era of
quantum utility, the integration of quantum with massively parallel computing
becomes critical to unlock the full potential of both technologies in a way
that exceeds the capabilities of either one alone. This integration is poised
to provide a rich environment for experts to experiment and optimize
resources in quantum algorithms and applications. Given the limitations in
efficiently emulating quantum applications to find the most optimal
implementations, direct interaction with evolving quantum hardware becomes
essential for application development. In this talk I will present the state
of the art of quantum computing and will touch on some architectural ideas
towards the integration of quantum and traditional HPC systems through a use
case that exhibits the interplay of both technologies in a heterogeneous
workflow.
Back to Session VIII
|
An Overview of High
Performance Computing and Responsibly Reckless Algorithms
Jack
Dongarra
Electrical Engineering
and Computer Science Department; Innovative Computing Laboratory, University
of Tennessee, Knoxville, TN, USA; Oak Ridge National Laboratory, USA and
University of Manchester, UK
In this talk we examine how high
performance computing has changed over the last 10-year and look
toward the future in terms of trends. These changes have had and will
continue to have a major impact on our software. Some of the software and
algorithm challenges have already been encountered, such as management of
communication and memory hierarchies through a combination of compile--time
and run--time techniques, but the increased scale of computation, depth of
memory hierarchies, range of latencies, and increased run--time environment
variability will make these problems much harder.
Mixed precision numerical methods turn out to be
paramount for increasing the throughput of traditional and artificial
intelligence (AI) workloads beyond riding the wave of the hardware alone.
Reducing precision comes at the price of trading away some accuracy for
performance (reckless behavior) but in noncritical
segments of the workflow (responsible behavior) so
that the accuracy requirements of the application can still be satisfied.
Back to Session I
|
Quantum Computing at Leonardo: an industrial
end-user standpoint
Daniele Dragoni
Leonardo S.p.A., High Performance Computing Lab.,
Genova, Italy
Quantum Computing (QC) is an emerging paradigm that
offers the potential to solve complex problems that are considered
intractable within the classical/digital computing domain. While tangible
quantum advantages have yet to manifest in practical scenarios, numerous
industries are actively exploring its potential advantages, striving to
secure competitive edges within their respective sectors.
In my presentation, I will outline Leonardo’s
strategic approach to thoroughly evaluate the capabilities and limitations of
QC within the aerospace, security, and defense domains. I will delve into our
stance on QC from an industrial end-user perspective, illustrating examples
of ongoing initiatives and practical applications that we are pursuing
through integrated HPC and QC methodologies, aligning with national strategic
objectives.
Back to Session IX
|
Embodied agents as scientific assistants
Ian Foster
Argonne National Laboratory, Data Science and
Learning Division, Argonne, IL and Dept. of Computer Science, The University
of Chicago, Chicago, IL, USA
An embodied agent is a
computational entity that can interact with the world through a physical body
or representation and adapt its actions based on learnings from these
interactions. I discuss the potential for such agents to serve as
next-generation scientific assistants, for example by acting as Cognitive
Partners and Laboratory Assistants. In the former case, agents, with their
machine learning and data-processing capabilities, complement the cognitive
processes of human scientists by offering real-time data analysis, hypothesis
generation, and experimental design suggestions; in the latter, they engage
directly with the scientific environment on the scientist’s behalf, for
example by performing experiments in bio-labs or running simulations on
supercomputers. I invite participants to envision a future in which human
scientists and embodied agents collaborate seamlessly, fostering an era of
accelerated scientific discoveries and broader horizons of understanding. I
hope to encourage debate about what technical advances will be required to
achieve this future, how we will ensure the safe and ethical use of such
agents, how and to what end we may seek to preserve human intuition, and the
possible redefinition of scientific discovery if machines are able to
theorize and validate.
Back to Session IV
|
SimOps, a New HPC Community Initiative Focusing on
Simplifying Use and Operation of Scientific and Engineering Simulations
Wolfgang
Gentzsch
The UberCloud,
Regensburg, GERMANY and Sunnyvale, CA, USA
In today’s fast-paced, competitive world,
engineering teams are under intense pressure to design high-quality,
innovative products in record time. The driving force behind this
acceleration is the growing reliance on engineering simulation in the product
design process. With an explosion in simulation software sophistication and
almost unlimited compute power, engineering teams rely on simulation to
create high-quality breakthrough products. Today, simulation is no longer a
luxury, it’s required for survival. From designing brand new products to
improving existing ones, simulation empowers companies to innovate, validate
ideas, and compete in a global economy.
But simulation engineers still face many hurdles
that limit their productivity and the quality of products they create, and
their contribution to their companies’ next generation products. New
applications like digital twins and artificial intelligence come with new
requirements for new software and hardware capabilities that further increase
the complexity of simulations and of the underlying computing infrastructure.
What can we do to master these new challenges and reduce the operational
burden on engineers and IT to manage complex HPC environments.
In this short presentation, we will announce a new
HPC community initiative that aims at reducing the challenges and the
operational burden on engineers and IT to use, operate, and manage complex
simulation environments that come with the ever evolving
applications, technologies, and infrastructures. We will demonstrate a set of
Best Practices that support engineers and HPC experts in simplifying use and
operation of simulation environments to make them more productive, deliver
higher-quality results, and thus contribute to the success of their company.
Back to Session II
|
Novel Methodology for Application Performance
Modelling and Evaluation
Vladimir
Getov
Distributed and
Intelligent Systems Research Group, School of Computer Science and
Engineering, University of Westminster, London, United Kingdom
Computer simulation of physical real-world phenomena
emerged with the invention of electronic digital computing and has been
increasingly adopted as one of the most successful modern methods for
scientific discovery. Arguably, the main reasons for this success have been
the rapid development of novel computer technologies that has led to the
creation of powerful supercomputers, large distributed systems,
high-performance computing frameworks with access to huge data sets, and high
throughput communications. In addition, unique and sophisticated scientific
instruments and facilities, such as giant electronic microscopes, nuclear
physics accelerators, or sophisticated equipment for medical imaging are
becoming integral parts of those complex computing infrastructures.
Subsequently, the term ‘e-science’ was quickly
embraced by the professional community to capture these new revolutionary
methods for scientific discovery via computer simulations of physical
systems. The relevant application codes are typically based on finite-element
algorithms, while the computations constitute heavy workloads that
conventionally are dominated by floating-point arithmetic. Examples include
application areas such as climate modeling, plasma
physics (fusion), medical imaging, fluid flow, and thermo-evolution.
Over the years, most of the relevant benchmarking
projects have covered predominantly dense physical system simulations, in
which high computational intensity carries over when parallel implementations
are built to solve bigger problems faster. Since emphasis was on dense
problems, this approach resulted in systems with increasing computational
performance and was the presumption behind the introduction of the very
popular semi-annual Top 500 rankings of supercomputers. However, in the last
10-15 years many new applications with very high economic potential have
emerged — such as big data analytics, machine learning, real-time feature
recognition, recommendation systems, and even physical simulations — that
feature irregular or dynamic solution grids. These applications spend much
more of their computation in non-floating-point operations such as address
computations and comparisons, with addresses that are no longer very regular
or cache-friendly. The computational intensity of such programs is far less than
for dense kernels, and the result is that for many real codes today, even
those in traditional scientific cases, the efficiency of the floating-point
units that have become the focal point of modern core architectures has
dropped from the >90% to <5%. This emergence of applications with
data-intensive characteristics — e.g. with execution times dominated by data
access and data movement — has been recognized recently as the “3rd Locality
Wall” for advances in computer architecture.
To highlight the inefficiencies described above, and
to identify architectures which may be more efficient, a new benchmark called
HPCG (High Performance Conjugate Gradient) was introduced several years ago.
HPCG also solves Ax=B problems, but where A is a very sparse matrix so that,
on evaluated systems, floating-point efficiency mirrors that seen in full
scientific codes. Recent detailed analysis confirms that HPCG performance in
terms of useful floating-point operations is dominated by memory bandwidth to
the extent that the number of cores and their floating-point capabilities are
irrelevant. Therefore, our selected benchmark codes that cover the “Physical
System Simulations” application area of interest are the High-Performance
LINPACK (HPL) and the HPCG. Both are very popular codes with very good
regularity of results in recent years. Our approach is to explore a
3-dimensional space — dense systems performance, sparse systems performance,
and energy efficiency for both cases. With HPL as the representative of dense
system performance and HPCG as the representative for sparse systems
performance, the available benchmarking results provide excellent
opportunities for comparisons and interpretation, as well as lay out a
relatively well-balanced overall picture of the whole application domain for
physical system simulations.
Back to Session X
|
Distributed Quantum Compiling
Vlad Gheorghiu
Institute for Quantum Computing, University of
Waterloo and SoftwareQ Inc, Waterloo, Ontario,
Canada
Quantum computing’s
potential to solve complex problems hinges on the ability to scale up for the
execution of large-scale quantum algorithms. One promising approach to
scalability is distributed quantum computing, where multiple nodes, each with
a relatively small number of qubits, are interconnected via
Einstein-Podolsky-Rosen (EPR) channels. These channels are generated on
demand and exhibit stochastic behavior, presenting
unique challenges in the distribution of logical circuits across the network.
In this talk, I present a novel distributed compiling strategy tailored for
such an architecture. Our approach effectively partitions quantum circuits
and maps them onto a network of interconnected quantum nodes, optimizing for
both performance and feasibility under the constraints of stochastic EPR
channel generation. I validate the compiling strategy through a series of
benchmark circuits, demonstrating its practical application and potential for
real-world quantum computing tasks. If time permits, I will also provide a
live demonstration of our distributed compiling method in action, showcasing
its effectiveness and operational viability.
Back to Session VII
|
Neutral-atom quantum computing within the
Munich Quantum Valley
Alexander Glätzle
CEO and Co-Founder PLANQ, Munich, Germany
Quantum computers
utilizing ultracold atoms confined in optical lattices exhibit exceptional
potential for addressing computationally complex problems. These systems
provide extended qubit coherence times, eliminate manufacturing variations,
and scale to thousands of qubits, all while operating at room temperature. In
this talk, we present the development of digital quantum computers within the
Munich Quantum Valley, a collaborative effort between the Max Planck
Institute of Quantum Optics, the Leibniz Supercomputing Center,
and planqc. Our focus is on integrating a neutral
atom quantum computer into a high-performance computing environment to
achieve quantum-accelerated HPC.
Back to Session X
|
Accelerating Extreme HPC Scale-out and AI
Environments
Frank Herold
ThinkParQ GmbH, Germany
This session will outline
how we have managed to drive the development of key features, functions, and
challenges to remain a disruptive parallel file system, that continues to
accelerate extreme HPC environments, whilst adapting our technology for nontraditional
HPC environments including AI, Energy and M&E.
Originating 2005 by the
Fraunhofer Institute for Industrial Mathematics, for the past 10 years BeeGFS has continued to be developed and delivered
globally by ThinkParQ. BeeGFS
is developed on an ‘Publicly Available Source Code’ development model with a
strong focus on performance and community needs, and it currently holds over
10% market share of parallel file systems in academic / non-profit research. It is also trusted and used to accelerate
some of the world's fastest supercomputers, and has strived its way to become
the parallel file systems of choice where performance matters.
Back to Session III
|
Quantum Computing and High-Performance
Computing: Rivals or Allies?
Rajeeb Hazra
QUANTINUUM, Broomfield, Colorado, USA
The rapid advancement of
quantum computing has sparked speculation about its potential to supplant
traditional high-performance computing (HPC) architectures. This keynote
delves into the pivotal question: Will quantum computing usurp HPC, or are
they destined to coexist as complementary technologies?
This keynote navigates the
convergence and divergence of quantum and classical computing paradigms. It
examines scenarios where quantum computing excels, such as cryptography and
optimization, while acknowledging the enduring relevance of HPC in domains like
weather forecasting, drug discovery, and engineering simulations. Moreover,
it explores synergistic possibilities where quantum accelerators enhance HPC
workflows, promising unprecedented computational power for scientific
discovery and technological innovation.
Back to Session I
|
Improving Future Climate Predictions with
Artificial Intelligence
Torsten
Hoefler
ETH Zurich, Full
Professor Department of Computer Science and Director Scalable Parallel
Computing Laboratory, Zurich, Switzerland
Artificial Intelligence and specifically Large
Language Models have had great impact on Science and Society at large. We
will show how those tools can be used in the context of one of humanity’s
hardest prediction challenges: the climate and future state of our planet. We
will discuss several ideas for accelerating weather and climate simulations,
using generative AI models for climate data compression, climate foundation
models, or diffusion-based operators for observation data assimilation. By harnessing these techniques, we aim to
significantly improve our understanding of future climate scenarios,
ultimately informing local and global strategies to mitigate climate change
and adapt to its effects.
Back to Session V
|
Social simulation with HPC and future Quantum
Computing
Nobuyasu Ito
RIKEN Center for
Computational Science, Kobe, Japan
Social phenomena are
extremely complex and have a huge degree of freedom, and great expectations
and challenges are required in the control and design of society. Massively
parallel supercomputers are useful for such purposes. Its performance
scalability provides a flexible platform for data analysis and simulation.
Examples include vehicle traffic analysis, evacuation schedules, pandemic
preparedness, and macroeconomic design.
Back to
Session IX
|
Revolutionizing HPC and AI: The Power of
Wafer-Scale Systems
Michael James
CEREBRAS, Sunnyvale, CA, USA
Wafer-scale systems extend
the feasible space for physical simulations by multiple orders of magnitude
in strong scaling. Hundred-fold time-to-solution improvements put wafer-scale
supercomputers into a new class of scientific instruments that can provide
real-time HPC. Moreover, the computational architectures that provide strong
scale for HPC workloads directly imply techniques for coupling simulations
with artificial intelligence.
In this talk, we will
describe the Cerebras wafer-scale platform, show
examples of hundred-fold accelerations, and introduce research directions for
AI at HPC scale.
Bio: Michael is
Founder and Chief Architect of Advanced Technologies at Cerebras,
the company that created the world’s largest and most powerful computer
processor. Michael leads the effort to reimagine the algorithmic building
blocks for the next generation of AI technologies. Prior to Cerebras, Michael was a Fellow at AMD, where he pioneered
a technique of adaptive and self-healing circuits based on cellular automata
that was applied toward distributed fault tolerant machines. Michael focuses
his career on exploration at the intersection of natural phenomena,
mathematics, and engineered machines. Michael's degree is in Molecular
Neurobiology, Computer Science and Mathematics from UC Berkeley.
Back to Session V
|
WACQT - the Swedish quantum computer effort
and testbed
Göran Johansson
Co-director WACQT and professor of Theoretical and
Applied Quantum Physics at Chalmers University of Technology in Gothenburg,
Sweden
In this talk I will give a
brief overview of the Wallenberg Center for Quantum
Technology (WAQCT), which is a twelve year 120 M€
effort which started 2018.
One of the two main goals
of this center is to build a Swedish
superconducting quantum computer and explore potential use-cases together
with our industrial partners.
In 2024 we also started a
testbed, where we let our Swedish researchers and industrial partners test
algorithms both on our own hardware as well as the IBM quantum computers.
Back to Session VII
|
Challenges of Deploying Emerging Computing
Technologies for U.S. Academic Research
Andrey Kanaev
U.S. National Science
Foundation, Program Director Office of Advanced Cyberinfrastructure Computer
and Information Science and Engineering Directorate, Alexandria, VA, USA
Novel computing paradigms are created in academic
laboratories, but their advent is driven by industry incentives and
investments. As a result, deployment of emerging technologies for scientific
computing at scale possesses distinctive challenges of attracting users, who
are ready to adopt new ways to compute; discovering suitable application
domains; allocating investments that are competitive with industry’s levels;
and estimating scientific return on investing. Additionally, each novel
paradigm, whether its quantum, brain-inspired, etc. presents its unique set
of issues. In this talk we will share some of the opportunities U.S. National
Science Foundation offers to academia to address these challenges.
Back to Session III
|
Performance evaluation of vector annealing on
NEC vector processor SX-Aurora TSUBASA
Hiroaki Kobayashi
Architecture Laboratory, Department of Computer
and Mathematical Sciences
Graduate School of information Sciences, Tohoku
University, Japan
In this talk, I will
introduce VE3.0, Vector annealer that is specially designed and implemented
on NEC’s vector computing platform, SX-Aurora TSUBASA, regarding features and
performance evaluation results by using the Traveling salesperson problem. I also present the vector-quantum hybrid
platform for the development of simulation-data analysis hybrid applications.
As an example, I will show you the formulation of optimal rescue resource
deployment after the Tsunami disaster and its performance evaluation.
Back to Session VI
|
LUMI HPC Ecosystem – Today and Tomorrow
Kimmo
Koski
CSC - Finnish IT Center for Science, Espoo, Finland
LUMI – one of the most efficient supercomputers in
Europe – is operating in Kajaani, center Finland,
in an old papermill where substantial amount of space and renewable energy is
available. The system has been in production since early 2022 and targeted to
run at least until end of 2027. LUMI is a joint effort by European Union and
11 countries, coordinated by the Finnish IT center
for Science, CSC. Finnish share of LUMI total 200 MEUR cost is 50 MEUR. This
spring Finnish government announced the investment of 250 MEUR for the
follow-up supercomputer – thus gathering together
the consortium to procure and deploy the next system will start now.
The talk will cover the usage of current LUMI and
plans for the next one. It will describe the state of the Ecosystem today and
developments which impact for the future, including the use of the
eco-efficient datacenter in Kajaani. The roles of
traditional HPC, AI and quantum computing are discussed, as also European
collaboration around these topics. Number of examples about applications are
presented.
Back to Session III
|
Defining the quantum-accelerated
supercomputing at NVIDIA
Elica Kyoseva
Director Quantum Algorithms
Engineering, NVIDIA, Santa Clara, California, USA
Quantum computing has the potential to offer giant
leaps in computational capabilities, impacting a range of industries from
drug discovery to portfolio optimization. Realizing these benefits requires
pushing the boundaries of quantum information science in the development of
algorithms, research into more capable quantum processors, and the creation
of tightly integrated quantum-classical systems and tools. I will review the
challenges facing quantumcomputing, showcase how GPUcomputing can help, and reveal exciting developments
in tightly integrated quantum-classical computing.
Back to Session VI
|
The road to Quantum Advantage via Classical
Control and Integration
Lorenzo
Leandro
Quantum Machines inc., Milan, Italy
Key quantum algorithms
that are expected to provide super-polinomial
speed-ups hold an unbelievable strategic and economic potential. However,
their full-scale practical implementation is still far away, requiring robust
error correction and thereby advanced control and both quantum and classical
systems. In this talk, we delve into the intricacies of running such key
algorithms on an error-corrected quantum computer from a control and
classical integration standpoint. We do this by looking at a simulated end-to-end
example of running Shor’s algorithm to factorize the number 21 within a
quantum error correction code on a superconducting QPU. By analyzing the
algorithms’ resource requirements, gate fidelity, and noise tolerance, we
derive essential criteria for designing an effective quantum control system
that will do the job, and we outline what type of quantum-classical
integration will get us there.
Back to Session VIII
|
Application Driven Optimizations in
High-Performance Interconnects for Supercomputing
Yutong Lu
Full Professor, School of Computer Science and
Engineering, Director, National Supercomputer Center in Guangzhou, Sun
Yat-Sen University, Guangzhou Higher education Mega Center, Guangzhou, China
The interconnect network
is a crucial component of large-scale supercomputing systems. As
supercomputing systems continue to progress, networks have been consistently
optimized. It should be noted that the ultimate goal
of all network optimizations is to serve applications. The architecture of
the network must evolve according to the communication characteristics of
applications, simultaneously eliminating redundancies to minimize unnecessary
costs. For communication middleware, it is essential to provide a better
abstraction for the network while retaining excellent performance of
underlying hardware. For applications, the design of communication schemes
should fully exploit the hardware and software features of networks. Thus, we
proposed a Unified Notifiable RMA (UNR) library to address these challenges.
Our evaluation demonstrates the performance improvements of the domain
applications on domestic supercomputers. As large-scale model training
emerges as a pivotal application in today’s supercomputing systems, we are
concentrating some critical network optimization techniques for large-scale
computing.
Back to Session II
|
Quantum Annealing Today: Updates from D-Wave
Irwan Owen
D-Wave Systems Inc., Germany and USA
Over the last few years,
D-Wave Quantum has seen customers moving from research projects in the lab to
production applications that provide business value. Our commercial-scale
hybrid solvers, real-time cloud access, and new features are enabling enterprise
and research organizations to leverage quantum technologies in more and more
ways. Join us in this session to hear about the latest results from our
customers, as well as updates and new features from D-Wave.
Back to Session VII
|
Developing a Quantum Computer with Tunable
Couplers
Riccardo
Manenti
Rigetti
Computing, Berkeley, CA, USA
As the field of quantum
computing advances, the demand for devices with higher performance and
greater qubit counts becomes more pressing. In this talk, I will outline the
evolution of our qubit architecture and elaborate on our strategy for scaling
quantum devices using superconducting qubits. I will introduce our tunable
coupler architecture and explain our implementation of parametric entangling
gates. Additionally, I will discuss the challenges in scaling, particularly
our efforts in integrating on-chip flux and microwave lines,
and present our modular approach.
Back to Session VIII
|
Moving Beyond QPU as an Accelerator:
Embracing Non-Von Neumann Approaches in Quantum
Programming Models
Stefano Markidis
KTH Royal Institute of Technology, Computer
Science Department, Stockholm, Sweden
The design of quantum
programming models has traditionally been grounded in the conceptual
framework of quantum circuits and gates introduced by David Deutsch in the
early 1980s. This framework typically envisions the Quantum Processing Unit
(QPU) as an accelerator within a host-device configuration, where the host
system offloads the program to the QPU for execution. However, quantum
computers predominantly consist of classical systems that stimulate and
measure quantum systems as black boxes, diverging significantly from the
circuit-offloading model. This abstraction is misaligned with the hardware’s
operational reality, hindering optimized implementations and limiting the
scope of operations. In contrast, concepts from non-Von Neumann
architectures—such as neuromorphic hardware and dataflow systems—utilize
abstractions like stimuli, channels, and schedules, which better align with
the nature of quantum computing systems and the physical processes they
embody. As David Deutsch originally conceptualized, computing is
fundamentally a physical process. Thus, advancing quantum programming models
should incorporate this perspective to achieve greater accuracy and physical
fidelity. By adopting physics-based programming models, we can develop
approaches to quantum computing that more accurately reflect the interactions
between classical hardware and quantum systems.
Back to Session VIII
|
Riken TRIP-AGIS and FugakuNEXT
- greatly accelerating next generation AI for Science
Satoshi Matsuoka
RIKEN Director Center
for Computational Science, Kobe and Department of Mathematical and Computing
Sciences Tokyo Institute of Technology, Tokyo, Japan
AI for Science, leveraging
high-performance computing (HPC), is set to transform scientific endeavors, accelerating innovation and societal benefits.
HPC and AI, once niche areas, are now pivotal in computer science, fueled by substantial investments in talent and
resources. This shift is evident in initiatives like Fugaku-LLM
in Japan, which utilizes 14,000 nodes of the Fugaku
supercomputer to train large-scale language models, emphasizing skill in
managing massive training operations. Concurrently, the TRIP-AGIS project at
Riken aims to integrate AI with simulation and automated experiments,
standardizing this approach across sciences in Japan to enhance innovation
cycles. These initiatives not only guide the development of the
next-generation FugakuNEXT supercomputer but also
explore key technical challenges such as optimizing data movement to boost
efficiency and capacity. These efforts are critical for advancing both AI and
traditional simulations in the upcoming post-exascale era.
Back to
Session I
|
Silicon chips made quantum
John Morton
Professor University College London – UCL,
Director of UCL Quantum Science and Technology Institute, and Co-Founder and
CTO of QUANTUM MOTION London, UK
Silicon MOS dominates
today’s information technology industry, having repeatedly replaced the
incumbent technology platform in diverse applications, but what will its role
be in quantum computing? Spins in silicon offer some of the longest quantum
coherence times of any solid-state system while cryogenic CMOS circuits of
increasing complexity have been designed and demonstrated to run at deep
cryogenic temperatures, opening a route to tightly integrating control
electronics with quantum devices. MOS devices fabricated on 300mm wafers, similar to those used in the silicon CMOS transistor
industry today, can be used to form spin qubit arrays capable of implementing
versatile quantum computing architectures. I will discuss recent progress at
Quantum Motion on MOS spin qubit devices fabricated using industrial grade
300mm wafer processing and their integration with cryogenic CMOS electronics,
showing how silicon could play a major role in the future quantum computing
industry. I will show how arrays of up to 1024 Si quantum dots can be
addressed on-chip using digital and analogue electronics and characterised
within 5 minutes, present MOS spin qubit readout fidelities in excess of 99.9% and exchange oscillations which form
the basis of two-qubit entangling gates. I will also discuss prospects for
different QC architectures based on MOS spin qubits covering the NISQ and
FTQC regimes, and requirements for control electronics.
Back to Session VII
|
Breaking the Memory Wall for Generative AI
Systems
Martin
Mueller
SambaNova Systems Inc, Palo Alto, CA, USA
Composition of Experts is
an alternative approach to lower the cost and complexity of training and
serving very large AI (language) models to overcome the memory wall caused by
increase in compute-to-memory of modern AI accelerators. This talk describes how
composition of experts, streaming dataflow, and a three
tier memory architecture can scale the memory wall.
Back to Session V
|
Toward Utility Scale Quantum Computing
Applications in Physical Science
Kevin
Obenland
Quantum Information and
Integrated Nanosystems, Lincoln Laboratory,
Massachusetts Institute of Technology MIT, Boston, MA, USA
Quantum computing provides a fundamentally new
capability that has the promise of accelerating the development of
applications in physical science. These applications include:
quantum chemistry, condensed matter systems, and high-energy-density physics,
among others. In order to assess the capabilities of
quantum computing for these applications we must identify specific problems
and parameter regimes, develop workflows that leverage quantum computing
algorithms, and assess the resources required by quantum computing
implementations used in the workflows. As part of the DARPA Quantum
Benchmarking program, MIT Lincoln Laboratory is actively developing a tool
called pyLIQTR, which provides implementations of
important quantum kernels used in the workflows of applicationss
in physical science. With the implementations provided by our tool, one can
measure the quantum resources required for applications at utility scale. In
this talk, I will describe the pyLIQTR tool and
show resource analysis results for problems that include:
local and periodic quantum chemistry, the Fermi-Hubbard model, and plasma
physics.
Back to Session VII
|
Charting Your Path
to Fault Tolerant Quantum Computing with Quantinuum
Nash Palaniswamy
QUANTINUUM, Broomfield, Colorado, USA
In this talk, we will
chart Quantinuum's path to true fault-tolerant
quantum computing, highlighting the critical advancements and milestones in
our fully integrated hardware and software stack. We will delve into the
latest technical progress in our QCCD Architecture, including achieving 99.9%
fidelity, addressing scalability, and introducing the first and only Level 2
resilient quantum computer with Microsoft.
The talk will illustrate
how these innovations support our journey towards fault tolerance through
real-world use cases of commercial importance across various industries, such
as fuel cell catalytic reactions, high-resolution seismic imaging, materials for
carbon capture, ammonia catalysis, quantum natural language processing for
peptide binding analysis, and fraud detection.
We will conclude with a
forward-looking perspective on our roadmap, outlining the steps we are taking
to achieve fault-tolerant quantum computing and the transformative potential
it holds for the future.
Back to Session VIII
|
Harnessing the Edge for Science
Manish
Parashar
Scientific Computing and
Imaging Institute and School of Computing University of Utah, Salt Lake City,
USA
Recent advances in edge devices are enabling
data-driven, AI-enabled scientific workflows integrate distributed data
sources. Combined with pervasively available computing resources, spanning
HPC to the edge, these workflows can help us understand end-to-end
phenomenon, drive experimentation, and facilitate important decision making.
However, despite the growth of available digital data sources at the edge,
and the ubiquity of non-trivial computational power for processing this data,
realizing such science workflows remains challenging. This talk will explore
a computing continuum spanning resources at the edges, in HPC centers and clouds, and in-between, and providing
abstractions that can be harnessed to support science. The talk will also
introduce recent research in programming abstractions that can express what
data should be processed and when and where it should be processed, and
autonomic middleware services that automate the discovery of resources and
the orchestration of computations across these resources.
Back to Session X
|
Advancements in HPC Integration with Quantum
Brilliance’s Room-Temperature Quantum Accelerators
Florian Preis
Quantum Brilliance GmbH, Stuttgart, Germany
In this talk, we will
delve into the latest developments in the field of quantum accelerators by
Quantum Brilliance, based on the use of NV centers in diamond to operate at
room temperature. The centerpiece of Quantum Brilliance's ongoing integration
work is the Quantum Brilliance QDK2.0, the latest version of their quantum
accelerator, which represents a significant leap forward in the practical
integration of quantum and classical computing. We will explore current HPC
integration projects that leverage the unique capabilities of the QDK.
Furthermore, we will discuss the different levels of classical
parallelization of quantum computations, which are crucial for maximizing the
efficiency and scalability of hybrid computing systems. By examining these
advancements, we aim to provide a comprehensive overview of the current
landscape and future directions for practical quantum computing.
Back to Session VII
|
Neutral Atoms at the Kiloqubit
Scale
Kristen Pudenz
Vice President of Research Collaborations, Atom
Computing, Berkeley, California, USA
Atom Computing has
demonstrated 1225 neutral atom qubits loaded in a computational array. We
will explore the technology behind this milestone, other novel technology
developed at Atom Computing, and address future development and opportunities
for collaboration.
Back to Session VI
|
CGRA Architectures for High-Performance
Computing and AI
Kentaro Sano
Team Leader, Processor Research Team, Center for
Computational Science, RIKEN, Japan
At RIKEN Center for
Computational Science (R-CCS), we have been researching future architectures
for HPC and AI. Especially, in Processor research team, we are focusing on
reconfigurable computing architectures such as coarse-grained reconfigurable
array (CGRA), which can be advantageous for limited data movement resulting
in lower power consumption. In this talk, we introduce the concept of CGRA
and our research on RIKEN CGRA for HPC and AI with architectural exploration
for more efficient computing.
Bio:
Kentaro Sano is the
team leader of the processor research team at RIKEN Center for Computational Science
(R-CCS) since 2017, responsible for research and development of future
high-performance processors and systems. He is also a visiting professor with
an advanced computing system laboratory at Tohoku University. He received his
Ph.D. from the graduate school of information sciences, Tohoku University, in
2000. From 2000 until 2018, he was a Research Associate and an Associate
Professor at Tohoku University. He was a visiting researcher at the
Department of Computing, Imperial College, London, and Maxeler
Technology corporation in 2006 and 2007. His research interests include
data-driven and spatial-parallel processor architectures such as a
coarse-grain reconfigurable array (CGRA), FPGA-based high-performance
reconfigurable computing, high-level synthesis compilers and tools for
reconfigurable custom computing machines, and system architectures for
next-generation supercomputing based on the data-flow
computing model.
Back to Session X
|
Drug design on quantum computers
Raffaele
Santagati
Quantum Computing
Scientist, Boheringer Ingelheim, Germany
The promising industrial applications of quantum
computers primarily rely on their anticipated ability to conduct precise and
efficient quantum chemical calculations. In computational drug discovery, the
accurate prediction of drug-protein interactions is paramount [1]. However,
several notable challenges need to be overcome to apply quantum computers to
drug design effectively.
First, efficiently computing expectation values for
observables beyond total energy is a significant challenge in fault-tolerant
quantum computing. Currently, quantum algorithms rely on nested quantum phase
estimation subroutines to calculate the expectation value of observables [2,
3]. Although quantum phase estimation is highly efficient, the frequent need
for nested quantum
phase estimations creates a bottleneck that makes computing observables
prohibitively expensive, even with the latest algorithmic advancements [4,
5]. This limitation presents a significant hurdle for quantum computing
applications in the pharmaceutical industry.
Secondly, molecular simulations at finite
temperatures are key in free energy calculations. These calculations are
crucial for determining thermodynamic quantities such as binding affinities.
However, this process can be pretty complex and
challenging due to the vast number of configurations needed. Millions of
calculations are typically required, each with a quantum computing run time
of several days, making it difficult to compete with the run times of
optimized experiments. Nevertheless, quantum computing has the potential to
provide an alternative solution [6]. For example, by simultaneously modeling
classical nuclei and quantum mechanical electrons on a quantum computer, it
may be possible to calculate thermodynamic quantities more practically and
efficiently. It may even be possible to generate thermal ensembles of
geometries and calculate thermodynamic properties like free energies directly
on a quantum computer. By overcoming these challenges, we could significantly
enhance the efficiency and applicability of molecular simulations at finite
temperatures. This could profoundly impact computational drug discovery in
the pharmaceutical industry.
This talk will explore some of these challenges and
discuss potential new routes for applying quantum computers to drug design.
[1] R. Santagati, A. Aspuru-Guzik,
R. Babbush, M. Degroote,
L. Gonz ìalez, E. Kyoseva,
N. Moll, M. Oppel, R. M. Parrish, N. C. Rubin, M. Streif, C. S. Tautermann, H. Weiss, N. Wiebe, and C. Utschig-Utschig,
Nature Physics , 1 (2024).
[2] M. Steudtner, S.
Morley-Short, W. Pol, S. Sim, C. L. Cortes, M. Loipersberger,
R. M. Parrish, M. Degroote, N. Moll, R. Santagati,
and M. Streif, Quantum 7, 1164 (2023).
[3] T. E. O’Brien, M. Streif, N. C. Rubin, R.
Santagati, Y. Su, W. J. Huggins, J. J. Goings, N. Moll, E. Kyoseva, M. Degroote, C. S. Tautermann, J. Lee, D. W. Berry, N. Wiebe, and R. Babbush, Phys. Rev. Res. 4, 043210 (2022).
[4] P. J. Ollitrault,
C. L. Cortes, J. F. Gonthier, R. M. Parrish, D. Rocca, G.-L. Anselmetti, M. Degroote, N. Moll, R. Santagati, and M. Streif, Enhancing
initial state overlap through orbital optimization for faster molecular
electronic ground-state energy estimation (2024), arXiv:2404.08565 [quant-ph].
[5] D. Rocca, C. L. Cortes, J. Gonthier, P. J.
Ollitrault, R. M. Parrish, G.-L. Anselmetti, M. Degroote, N. Moll, R. Santagati, and M. Streif, Reducing
the runtime of fault-tolerant quantum simulations in chemistry through
symmetry-compressed double factorization (2024), arXiv:2403.03502 [quant-ph].
[6] S. Simon, R. Santagati, M. Degroote, N. Moll, M. Streif, and N. Wiebe, PRX Quantum
5, 010343 (2024).
Back to Session IX
|
Scaling AI for
Science
Anna Scaife
University of Manchester, Manchester,
UK
The
neural scaling laws that have motivated the current generation of large AI
models (such as GPT-n) suggest that larger models trained on more data will
perform better. But while the data supporting these laws in supervised
computer vision is drawn from experiments with ImageNet or ImageNet-like
datasets, these standard benchmark datasets are highly curated. Here we ask:
Are these scaling laws reliable for practitioners in fields like cell
biology, medical imaging, remote sensing, etc., who work with qualitatively
different data types? Here I will present the first systematic investigation
of supervised scaling laws outside of an ImageNet-like context – on images of
galaxies. We use 840k galaxy images and over 100M annotations by Galaxy Zoo
volunteers, comparable in scale to Imagenet-1K. We find that while adding
annotated galaxy images provides a power law improvement in performance
across all architectures and all tasks, adding trainable parameters is
effective only for some tasks. By comparing the downstream performance of
finetuned models pretrained on either ImageNet-12k alone vs. additionally
pretrained on our galaxy images we show that our finetuned models are more
label- efficient and, unlike their ImageNet-12k-pretrained equivalents, often
achieve linear transfer performance equal to that of end-to-end finetuning.
We find relatively modest additional downstream benefits from scaling model
size, implying that scaling alone is not sufficient to address our domain
gap, and suggest that other scientific fields with qualitatively different
data from ImageNet might benefit more from in-domain adaption followed by
targeted downstream labelling.
Back
to Session IV
|
Launching the Grace Hopper Superchip on the
‘Alps’ Cloud-Native Supercomputer
Thomas Schulthess
CSCS Swiss National Supercomputing Centre, Lugano
and ETH, Zurich,
Switzerland
The 'Alps' cloud-native supercomputing
infrastructure, leveraging HPE’s Cray Shasta EX product line, features
versatile software-defined clusters (vClusters)
configured via partitions of the Slingshot network to accommodate diverse
research needs. These vClusters support various
applications from traditional HPC workloads to high-throughput tasks for the
World LHC Compute Grid and the Materials Cloud. Recently, MeteoSwiss’s
ICON-22 model commenced operations on 'Alps' utilizing a geo-distributed
configuration. This presentation will detail the deployment of NVIDIA’s Grace
Hopper superchip (GH200) within these settings. The GH200-based CG4 nodes,
integral to this recent extention, combine four
'superchips' connected by NVLink, achieving a
balanced memory system and promising performance as demonstrated by initial
applications. Despite these advances, the high energy density of the system
presents significant challenges, primarily due to increased electrical power
consumption. The system is very efficient and most
applications run at peak power.
Back to Session X
|
Progress towards large-scale fault-tolerant
quantum computing with photons
Pete
Shadbolt
Co-Founder PsiQuantum, Palo Alto, California, USA
In this talk we will
describe progress towards large-scale, fault-tolerant quantum computing with
photons. This talk will span materials innovations for high-performance
photonics, improvements in photonic component performance with an emphasis on
improved optical loss, prototype systems of entangled photonic qubits, qubit
networking, and novel high-power cryogenic cooling solutions designed for
future datacenter-scale quantum computers. We will show new prototype systems
designed to progressively overcome the key challenges to scaling up photonic
quantum computers. We will also give an overview of the architecture of
fusion-based photonic quantum computers, describe near-term systems
milestones, and give a view on the long-term roadmap to useful, fault-tolerant
machines.
Back to Session VIII
|
The Decade Ahead:
Building Frontier AI Systems for Science and the Path to Zettascale
Rick Stevens
Argonne National
Laboratory, University of Chicago, USA
The
successful development of transformative applications of AI for science,
medicine and energy research will have a profound impact on the world. The rate of development of AI capabilities
continues to accelerate, and the scientific community is becoming
increasingly agile in using AI, leading to us to anticipate significant
changes in how science and engineering goals will be pursued in the future.
Frontier AI (the leading edge of AI systems) enables small teams to conduct
increasingly complex investigations, accelerating some tasks such as
generating hypotheses, writing code, or automating entire scientific
campaigns. However, certain challenges remain resistant to AI acceleration
such as human-to-human communication, large-scale systems integration, and
assessing creative contributions. Taken together these developments signify a
shift toward more capital-intensive science, as productivity gains from AI
will drive resource allocations to groups that can effectively leverage AI
into scientific outputs, while other will lag. In addition, with AI becoming the major
driver of innovation in high-performance computing, we also expect major
shifts in the computing marketplace over the next decade, we see a growing
performance gap between systems designed for traditional scientific computing
vs those optimized for large-scale AI such as Large Language Models. In part,
as a response to these trends, but also in recognition of the role of
government supported research to shape the future research landscape the U.
S. Department of Energy has created the FASST (Frontier AI for Science,
Security and Technology) initiative.
FASST is a decadal research and infrastructure development initiative
aimed at accelerating the creation and deployment of frontier AI systems for
science, energy research, national security.
I will review the goals of FASST and how we imagine it transforming
the research at the national laboratories.
Along with FASST, I’ll discuss the goals of the recently established Trillion Parameter
Consortium
(TPC), whose aim is to foster a community wide effort to accelerate the
creation of large-scale generative AI for science. Additionally, I'll introduce the AuroraGPT project an international
collaboration to build a series of multilingual multimodal foundation models
for science, that are pretrained on deep domain knowledge to enable them to
play key roles in future scientific enterprises.
Back to Session I
|
HPC and Machine Learning for Molecular
Biology: ADMIRRAL Project Update
Frederick Streitz
Center for Forecasting and Outbreak Analytics
(CFA/CDC), USA and National AI Research Resource Task Force (NAIRR-TF) USA
and Lawrence Livermore, National Laboratory (LLNL/DOE), Livermore,
California, USA
The joint application of high performance computing (HPC) and Machine Learning (ML)
has enabled advances in a the number of scientific disciplines. One of the
most powerful demonstrations has been in the area of
computational biology, where the addition of ML techniques has helped
ameliorate the lack of clear mechanistic models and often poor statistics
which has impeded progress in our understanding. I will discuss progress in
the development of a hybrid ML/HPC approach to investigate the behavior of an
oncogenic protein on cellular membranes in the context of the ADMIRRAL
(AI-Driven Machine-learned Investigation of RAS-RAF Activation Lifecycle)
Project, a collaboration between the US Department of Energy and the National
Cancer Institute.
Back to Session II
|
Provable Advantage in Quantum PAC Learning
Sergii Strelchuk
Department of Applied Mathematics and Theoretical
Physics and Centre for Quantum Information and Foundations University of
Cambridge and University of Warwick, Computer Science Department, Warwick
Quantum Centre, UK
In this talk I will
provide a gentle introduction to PAC learning and revisit the problem of characterising the complexity of Quantum PAC learning, as
introduced by Bshouty and Jackson [SIAM J. Comput. 1998, 28, 1136–1153]. Several quantum advantages
have been demonstrated in this setting, however,
none are generic: they apply to particular concept classes and typically only
work when the distribution that generates the data is known. In the general
case, it was recently shown by Arunachalam and de Wolf [JMLR, 19 (2018) 1-36]
that quantum PAC learners can only achieve constant factor advantages over
classical PAC learners.
We show that with a
natural extension of the definition of quantum PAC learning used by
Arunachalam and de Wolf, we can achieve a generic advantage in quantum
learning.
The talk is based on https://eccc.weizmann.ac.il/report/2023/142/
Back to Session VII
|
Modular
Supercomputing, HPC and AI
Estela
Suarez
Juelich Research Center,
Juelich, Germany
For the major technology providers
HPC is a small market with low return of investment, for which it does not
pay off to develop specific products. Therefore, in the last 30 years they
have designed their products with the much larger volumes of the sever market
in mind, and we have taken them off-the-shelf to build HPC clusters. Now, the
industry has moved their focus towards the cloud and, most recently, towards
the exploding AI-market dominated by hyperscalers.
The latter do build their own devices and dictate the design of upcoming CPUs
and accelerators, which are tailored to address the requirements of AI
applications. To benefit from these new products, we in HPC must ask
ourselves how the specific requirements of our traditional HPC applications
differ from those of the most popular AI-models, and how we can make them
compatible with each other. We also need to rethink how HPC systems should
look like to serve the needs of both HPC and AI application domains. In this
talk, we discuss how the Modular Supercomputing Architecture can be a vehicle
to achieve this goal.
Bio
Prof. Dr. Estela Suarez is Joint Lead of the department “Novel
System Architecture Design” at the Jülich
Supercomputing Centre, which she joined in 2010. Since 2022 she is also
Associate Professor of High Performance Computing at
the University of Bonn, and member of the RIAG (Research and Innovation
Advisory Board from EuroHPC JU). Her research
focuses on HPC system architecture and codesign. As leader of the DEEP
project series she has driven the development of the
Modular Supercomputing Architecture, including hardware, software and
application implementation and validation. She also leads the codesign
efforts within the European Processor Initiative. She holds a PhD in Physics
from the University of Geneva (Switzerland) and a Master
degree in Astrophysics from the University Complutense
of Madrid (Spain).
Back to Session IV
|
Breaking the HPC Communication Wall with Tightly-coupled Supernodes
Samantika Sury
SAMSUNG Electronics
America, Westford, MA, USA
oday’s large scale HPC
systems generally provide high-performance heterogenous nodes connected via a
high-performance network fabric like Ethernet or Infiniband.
A challenge with such a system architecture is that utilization of
accelerators inside a node tends to still be challenging due to the costs of
offloading and data movement . Another challenge is
the significant performance cliff once you leave the node due to bandwidth, latency and
software overheads in communication. A more scalable system architecture in
HPC and AI is possible through the aggregation of tightly-coupled
nodes into a “Supernode” augmented with a memory
model for productive programming before accessing a scale-out network
fabric. With industry innovations like
NVlink, CXL3.0 and UAlink,
the future of datacenters is also trending in this
direction and joint innovation in this area will be key to future scalable
system architectures. Keeping in mind the theme of the workshop “State of the
Art, Emerging Disruptive Innovations and Future Scenarios in HPC”, this talk
will discuss the value proposition of tightly-coupled
Supernodes to improve communication for HPC and AI,
some industry trends that are driving this direction and point out some
challenges to overcome.
Back to Session III
|
Accelerating Progress in Delivering Clean
Energy Fusion for the World with AI, ML, and Exascale Computing
William
Tang
Princeton University
Dept. of Astrophysical Sciences, Princeton Plasma Physics Laboratory; Center for Statistics and Machine Learning (CSML) and
Princeton Institute for Computational Science & Engineering (PICSciE), Princeton University, USA
The US goal (March, 2022)
to deliver a Fusion Pilot Plant [1] has underscored urgency for accelerating
the fusion energy development timeline. Validated scientific and engineering
advances driven by Exascale Computing together with advanced statistical
methods featuring artificial intelligence/deep learning/machine learning
(AI/DL/ML) must properly embrace Verification, Validation, and Uncertainty
Quantification (VVUQ) to truly establish credibility. Especially time-urgent in the Clean Energy
Fusion grand challenge application domain is the need to predict and avoid
large-scale “major disruptions” in tokamak systems.
Disruption prediction has enjoyed great progress through the use of high dimensional signals, modern deep
learning methods, multi-device training and testing. We expect accelerated
progress through the use of additional architectural
improvements such as transformers as well as multi-time scale models (e.g.
temporal convolutions to take advantage of the wide range of natural temporal
scales of the measured diagnostic signals. Integrating additional multi-model
signals (such as frequency domain signals, ECEi
data, 2D radiation profiles, etc.) into a single model provides additional
opportunities for performance improvement.
Foundation model-type efforts have especially promising potential for
impact. The general framework for
enabling huge advances can come from the rapidly evolving LM’s and Image
Recognition models. Associated
international R&D efforts such as the "Trillion Parameter
Consortium" [https://tpc.dev/tpc-european-kick-off-workshop] are already
focusing on the training of multi-billion parameter models on a mix of
experimental and simulation data. With
rapidly advancing modern technology this can rapidly lead to the fine-tuning
of huge models multiple times into several smaller
distilled models in the category of “Multitask Learning for Complex &
Diverse Control Needs.”
This presentation will highlight the deployment of
recurrent and convolutional neural networks in Princeton’s Deep Learning Code
-- "FRNN" – that enabled the first adaptable predictive DL model
for carrying out efficient "transfer learning" while delivering
validated predictions of disruptive events across major internatonal
tokamak devices [2]. Moreover, the
AI/DL capability -- in an "understandable sense" can provide not
only the “disruption score,” as an indicator of the probability of an
imminent disruption but also a “sensitivity score” in real-time to indicate
the underlying reasons for the predicted disruption [3]. A real-time prediction and control
capability has recently been significantly advanced with a novel surrogate
model/HPC simulator ("SGTC") [4] -- a first-principles-based
prediction and control surrogate necessary for projections to future
experimental devices (e.g., ITER, FPP’s) for which no "ground
truth" observational data exist.
Finally, an exciting and rapidly developing area
that cross-cuts engineering design with advanced
visualization capabilities involves AI-enabled advances in Digital Twins –
with the FES domain providing stimulating exemplars. This has also witnessed prominent recent
illustrations of the increasingly active collaborations between leading
industries such as NVIDIA that enabled productive advances for tokamak
digital twins with dynamic animations of the advanced AI-enabled surrogate
model SGTC [4] and NVIDIA’s "Omniverse" visualization tool
[5]. More generally, the scientific
merits of Digital Twins are well analyzed in the
recent US National Academies Report on “Foundational Research Gaps and Future
Directions for Digital Twins” [6].
REFERENCES:
[1]
https://www.whitehouse.gov/ostp/news-updates/2022/04/19/readout-of-the-white-house-summit-on-developing-a-bold-decadal-vision-for-commercial-fusion-energy/
[2] Julian Kates-Harbeck,
Alexey Svyatkovskiy, and William Tang,
"Predicting Disruptive Instabilities in Controlled Fusion Plasmas
Through Deep Learning," NATURE 568, 526 (2019)
[3] WilliamTang, et
al., Special Issue on Machine Learning Methods in Plasma Physics,
Contributions to Plasma Physics (CPP), Volume 63, Issue 5-6, (2023).
[4] Ge Dong, et al., 2021, Deep Learning-based
Surrogate Model for First-principles Global Simulations of Fusion Plasmas,
Nuclear Fusion 61 126061 (2021).
[5] William Tang, et al., 2023, AI-Machine
Learning-Enabled Tokamak Digital Twin, Proceedings of 2023 IAEA FEC, London,
UK (2023).
[6]
https://www.nationalacademies.org/our-work/foundational-research-gaps-and-future-directions-for-digital-twins
(2023).
Back to Session IV
|
The National Science Data Fabric:
Democratizing Data Access for Science and Society
Michela
Taufer
The University of
Tennessee, Electrical Engineering and Computer Science Dept. Knoxville, TN,
USA
The National Science Data Fabric (NSDF) pilot
project is a transformative initiative to democratize data-driven sciences
through a cyberinfrastructure platform that ensures equitable access. By
integrating a programmable Content Delivery Network (CDN), NSDF achieves
interoperability across various computing environments, enabling seamless
computing, storage, and networking integration. This strategy enables the
development of community-driven solutions and domain-specific advancements
efficiently. A key element of NSDF’s approach is its dedication to community
education and outreach, especially through collaborations with
minority-serving institutions, to ensure widespread access. Our presentation
will introduce the shared, modular, and containerized NSDF environment,
designed to bridge significant gaps in the national computational
infrastructure and tackle the ‘missing millions’ in STEM talent. We will
highlight NSDF’s commitment to fostering an inclusive, diverse workforce and
its efforts towards collective success in various fields, including material
sciences, astrophysics, and earth sciences. Through testimonials and live
demonstrations, we will showcase the impactful services provided by NSDF to
support global science and engineering goals and to engage the broader
scientific community effectively.
Vita:
Dr. Michela Taufer is an AAAS Fellow and ACM Distinguished Scientist; she
holds the Dongarra Professorship in High-Performance Computing in the
Department of Electrical Engineering and Computer Science at the University
of Tennessee Knoxville (UTK). She earned her undergraduate degree in Computer
Engineering from the University of Padova (Italy) and her doctoral degree in
Computer Science from the Swiss Federal Institute of Technology (ETH) Zurich
(Switzerland). From 2003 to 2004, she was a La Jolla Interfaces in Science
Training Program (LJIS) Postdoctoral Fellow at the University of California
San Diego (UCSD) and The Scripps Research Institute (TSRI), where she worked
on interdisciplinary projects in computer systems and computational
chemistry.
Dr. Taufer’s commitment to interdisciplinary collaboration has been a
constant throughout her career, with a particular passion for connecting
computational and experimental sciences. Her research targets designing and
implementing cyberinfrastructure solutions that leverage high-performance
computing, cloud computing, and volunteer computing. She also focuses on the
application of HPC in artificial intelligence and machine learning, is
dedicated to enhancing algorithms and workflows for scientific applications,
and advocates for reproducibility, replicability, and transparency in
scientific research while pushing the boundaries of in situ and in transit
data analytics.
Dr. Taufer has led several National Science Foundation collaborative
projects and has served in leadership roles at HPC conferences such as ISC
and IEEE/ACM SC. Beyond research and leadership, Dr.
Taufer has been influential on steering committees and editorial boards,
currently serving as the editor-in-chief of the journal Future Generation
Computer Systems. Her commitment to growing a diverse and inclusive community
of scholars is evident in her mentorship of students across a spectrum of
interdisciplinary research.
Back to Session III
|
AI is changing our
world but can it be run sustainably?
Scott Tease
Lenovo, Vice President HPC
and AI, Morrisville, NC, USA
AI
is rapidly changing how we see and interact with our world but the power that
is required to run it is creating problems for the data center, power
delivery infrastructure and potentially for the long term health of the
environment. In this talk we will look
at how AI can be designed and run more sustainably from server design to data
center operation. The power grid, the
environment and IT budgets all mandate we rethink the way we operate and cool
these amazing AI systems.
Bio:
Scott is Vice President and General Manager of Lenovo’s Artificial
Intelligence (AI) and High Performance Computing (HPC) Businesses, he is also
the lead executive for Lenovo data center focused environmental and
sustainability efforts. He has been with Lenovo since 2014 following the
acquisition of IBM’s System x team. Prior to this he spent fourteen years as
a member of the IBM System x Executive Team.
He and his team are responsible for Lenovo’s end to end AI and HPC
strategies - focused on leadership in the mid-market and strong presence in
the TOP500. Lenovo is focused on
bringing ‘exascale’ level capabilities to users at ‘everyscale’ while doing
it as sustainably as possible.
Back
to Session I
|
Our first move and second step toward
"HPC-Oriented" Quantum-HPC Hybrid platform software
Miwako Tsuji
RIKEN Center for Computational Science, Kobe, Japan
We had started the the
development of the quantum HPC hybrid platform from last year. In this talk,
we present several prototype implementations and preliminary experiment
results. We also present the design of the quantum HPC hybrid platform
software defined based on the preliminary experiments and a lot of discussion
with researchers in quantum hardware, quantum SDK, and so on.
Our "HPC-oriented" design exploits
supercomputers’ performance efficiently and provides flexible solutions to
support different kinds of Quantum and HPC hybrid applications.
Back to Session IX
|
Navigating AI’s impact on energy efficiency
and resource consumption
Andrew Wheeler
HPE Fellow & VP, Hewlett Packard Labs, Fort
Collins, CO, USA
The world has recently
witnessed an unprecedented acceleration in the demand for machine learning
and AI applications. This spike in demand has imposed tremendous strain on
today’s technology performance, power, and energy consumption. Future trends
indicate unsustainable spending and a widening technology gap. This talk will
examine promising technologies with the potential to bring orders of
magnitude of improvements to the growing cost, energy, and performance
challenges.
Back to Session I
|
Lessons Learned from Pre-training Large
Language Models
Rio
Yokota
Tokyo Institute of
Technology, Tokyo, Japan
Since the release of ChatGPT, there have been many
efforts to pre-train large language models (LLM) with similar capabilities as
ChatGPT. In Japan, there are efforts to train LLMs with strong Japanese
capabilities and good understanding of the Japanese culture. However, since
English is the dominant language on the internet, it is difficult to find
high quality Japanese text data of similar quantity as the English data
commonly used to train LLMs. There are also many challenges with the training
itself since many types of distributed parallelism need to be combined to
extract the full potential of GPU supercomputers. Since the runs can take
months on thousands of nodes, hardware failure is another problem that we
cannot neglect. In this talk I will summarize the lessons learned from three
different projects in Japan to pre-train LLMs.
Back to Session IV
|
|