Computing the Brain
Katrin
Amunts
Human Brain Project,
Chair of The Science and Infrastructure Board / Scientific Research Director,
Institute for Neuroscience and Medicine, Structural and Functional
Organisation of the Brain, Forschungszentrum Juelich GmbH, Juelich, Germany
and
Institute for Brain Research, Heinrich Heine
University Duesseldorf, University Hospital Duesseldorf, Germany
Neuroscience research is covering a large spectrum
of empirical and theoretical approaches, with an increasing demand in terms
of computation, data handling, analytics and storage. This is true in
particular for research targeting the human brain, with its incredible high
number of neurons that form complex networks. Demands for HPC arise from a
heterogeneous portfolio of neuroscientific approaches: (i)
studying human brains at cellular and ultra-structural level with Petabytes
for a single brain data set; (ii) the reconstruction of the human connectome,
i.e., the totality of connections of nerve cells and their ways to interact;
(iii) modeling and simulation at different levels
of brain organization with finer details, to make models biologically more
realistic; (iv) the analysis of large, multimodal data sets using workflows
that employ deep learning and machine learning, simulation, graph based
interference etc.; (v) large cohort studies including many thousands of
subjects with data from neuroimaging, behavioral
tests, genetics, biochemical markers etc. to disclose relationships between
genes, environment and the brain, while considering large variations between
subjects. To address such diverse requirements, the Human Brain Project is
developing its digital research infrastructure EBRAINS, and FENIX, as the HPC
platform for Computing the Brain.
Back to Session VI
|
The Role of EOFS and the Future of Parallel
File Systems for HPC
Frank
Baetke
EOFS European Open File System Organization
formerly Hewlett Packard Enterprise, Munich, GERMANY
Parallel File Systems are an essential part of
almost all HPC-Systems. The need for that architectural concept originated
with the growing influence and finally complete takeover of the HPC spectrum
by parallel computers either defined as clusters or as MPPs following the
nomenclature of the TOP500.
A major step towards parallel file systems for the
high end of HPC systems occurred around 2001 when the US DoE funded the
development of such an architecture called LUSTRE as part of the ASCI path
forward project with external contractors that included Cluster File Systems
Inc. (CFS), Hewlett Packard and Intel. The acquisition of the assets of CFS
by SUN Microsystems in 2007 and its subsequent acquisition by ORACLE in 2010
led to a crisis with the cancellation of future work on LUSTRE.
To save the assets and ensure further development a
few HPC-focused individuals founded organizations as EOFS, OpenSFS and Whamcloud to move
LUSTRE to a community-driven development. In 2019 EOFS and OpenSFS jointly acquired the LUSTRE trademark, logo and
related assets.
In Europe development of a parallel file system
focused on HPC began in 2005 at the German Fraunhofer
Society also as an open-source project dubbed FhGFS
(Fraunhofer Global Parallel File System) that has
now - driven by its spin-off ThinkParQ and renamed BeeGFS – gained worldwide recognition and visibility.
In contrast to community-driven open-source concepts
several proprietary parallel file systems are widely in use with IBM’s
Spectrum Scale – originally known as GFPS – having the lead in HPC with a
significant number of installations at the upper ranks in the TOP500 list.
But there are other interesting proprietary concepts with specific areas of
focus and related benefits.
In this talk we will review the role of EOFS
(European Open File Systems - SCE) and provide hints towards the future of
the HPC parallel file systems landscape.
Note: all trademarks are the property of their
respective owners
Back to Session II
|
High Performance
Computing for Bioinformatics
Mario
Cannataro
Department of Medical and Surgical Sciences,
University of Catanzaro, ITALY
Omics sciences (e.g. genomics, proteomics, and interactomics) are gaining an increasing interest in the
scientific community due to the availability of novel, high throughput
platforms for the investigation of the cell machinery, and have a central
role in the so called P4 (predictive, preventive, personalized and
participatory) medicine and in particular in cancer research. High-throughput experimental platforms and clinical
diagnostic tools, such as next generation sequencing, microarray, mass
spectrometry, and medical imaging, are producing overwhelming volumes of
molecular and clinical data and the storage, integration, and analysis of
such data is today the main bottleneck of bioinformatics pipelines.
This Big Data trend in bioinformatics, poses new challenges
both for the efficient storage and integration of the data and for their
efficient preprocessing and analysis. Thus,
managing omics and clinical data requires both support and spaces for data
storing as well as algorithms and software pipelines for data preprocessing, integration, analysis, and sharing.
Moreover, as it is already happening in several application fields, the
service-oriented model enabled by the Cloud is more and more spreading in
bioinformatics.
Parallel Computing offers the computational power to
face this Big Data trend, while Cloud Computing is a key technology to hide
the complexity of computing infrastructures, to reduce the cost of the data
analysis task, and to change the overall model of biomedical and
bioinformatics research towards a service-oriented model.
The talk introduces main omics data (e.g. gene
expression and SNPs, mass spectra, protein-protein interactions) and
discusses some parallel and distributed bioinformatics tools and their
application in real case studies in cancer research, as well as recent
initiatives to exploit international Electronic Health Records to face
COVID-19, including:
- preprocessing and mining of microarray data for
pharmacogenomics applications,
- biological networks alignment, community
detection, and applications in brain connectome,
- integrative bioinformatics, integration
and enrichment of biological pathways,
analysis of international
Electronic Health Records to face the COVID-19 pandemic: the Consortium for
Clinical Characterization of COVID-19 by EHR (4CE).
Short bio
Mario Cannataro is a Full Professor of computer engineering at the
University "Magna Græcia" of Catanzaro,
Italy, and the Director of the Data Analytics Research Center.
His current research interests include parallel computing, bioinformatics,
health informatics, artificial intelligence. He published three books and
more than 300 papers in international journals and conference proceedings.
Mario Cannataro is a Senior Member of ACM, ACM SIGBio,
IEEE, BITS (Bioinformatics Italian Society) and SIBIM (Italian Society of
Biomedical Informatics).
Back to Session VI
|
High Performance Computing and Cloud
Computing, key enablers for digital transformation
Carlo Cavazzoni
Leonardo S.p.A., Head of Cloud Computing,
Director High Performance Computing Lab, Chief Technology & Innovation
Office, Genova, Italy
HPC for many industries is becoming a key technology
for the competitiveness and digitalization. In particular every industry will
have to applying digital technologies determining a paradigm shift, the value
of goods/services move from the exploitation of physical systems to the
exploitation of knowledge. AI, computer simulations and other digital
technologies are tools to help mining out more knowledge, faster. The more
the better.
In this scenario HPC is a tool, a tool to process BigData, enable AI and perform simulations, and more
often it is combined with Cloud Computing services (virtual machines and
containers especially popular for BigData and AI
frameworks).
HPC can accelerate the creation of value thanks to
the capability to generate new knowledge and perform more accurate
predictions (e.g. developing Digital Twins).
Whereas computational capacity is a fundamental resource
for competitiveness, row computational capacity alone is useless, the
software is the key to unlock the value. This is why, beside the
supercomputer, we need to create the capability to implement applications or
improve the already existing one.
In the talk I will present how Leonardo with the key
contribution of the HPC Lab, intends to implement leadership software tools
and computational infrastructure able to add value to the company, and
ultimately transform it, to be more digital than physical.
Back to Session II
|
A domain wall encoding of variables for
quantum annealing
Nicholas
Chancellor
Department of Physics, Durham University,
United Kingdom
I will discuss the application of a relatively new
method for encoding discrete variables into binary ones on a quantum annealer. This encoding is based on the physics of domain
walls in frustrated Ising spin chains and can be
shown to perform better than the traditional one-hot encoding both in terms
of efficiency of embedding the problems into quantum annealers
and in terms of performance on actual devices.
I first review this encoding strategy and contrast
it with the one-hot technique as well as numerical evidence of an embedding
advantage following the discussion in [Chancellor Quantum Sci. Technol. 4
045004]. Next, I will discuss recent experimental evidence presented in
[Chen, Stollenwerk, Chancellor arXiv:2102.12224]
which shows that this encoding can lead to a large improvement in the
performance of quantum annealers on coloring problems, this improvement is large enough that
using the domain-wall encoding on an older generation D-Wave 2000Q quantum
processing unit yields superior result to using the one-hot encoding on a
more advanced Advantage QPU, indicating that better encoding can make a large
difference in performance. Additionally I will touch on some more recent work
inolving the quadratic assignment problem. Finally,
I will discuss the importance of this encoding for the simulation of quantum
field theories directly on trasverse Ising model quantum annealers
[Abel, Chancellor, Spannowsky Phys. Rev. D 103,
016008].
Back to Session V
|
Quantum Computer, dream or reality?
Daniele Dragoni
Leonardo
S.p.A., High Performance Computing Lab, Genova, ITALY
As the miniaturization of semiconductor transistors
approaches its physical limits, the performance increase of microprocessors
is slowing down to the point that the operating frequency increase from one
chip generation to the next it is almost nil. In the attempt to catch up with
Moore's law, the computing architectures have evolved to take full advantage
of parallelization schemes: vectorial, multicore,
GPU, etc ... Following the current trends, however,
it will never be possible to efficiently address selected computational tasks
of practical interest.
In this scenario, it is clear that any hypothesis
that leads to a radical overcoming of the limitations of digital computing is
highly interesting. In particular, quantum-computing devices that operate by
exploiting the principles of quantum physics are believed to provide a route
for such a paradigmatic shift. In practice, however, building a quantum
computer is an engineering challenge unmatched (comparable to nuclear fusion).
To date, quantum computers have been built with very few logical units (the
Qubits), and it is not yet fully clear if and when they will prove superior
to digital computers in concrete problems of practical interest. In the
presentation we will introduce the research streams that we are following in
the quantum computing domain, from quantum inspired up to real quantum
applications that we will test on simulated and physical quantum computers.
Finally, we will analyze all the elements and steps
to consider for the introduction of quantum computing within our own
infrastructure.
Back to Session V
|
HPTMT
High-Performance Data Science and Data Engineering based on Data-parallel
Tensors, Matrices, and Tables
Geoffrey
Fox
School of Informatics, Computing and
Engineering, Department of Intelligent Systems Engineering; Digital Science Center and Data Science program
University of Indiana Bloomington, IN, USA
The continuously increasing size and complexity of
data-intensive applications demand high-performance but still highly usable
environments. We integrate a set of ideas developed in various data science
and data engineering frameworks. They employ a set of operators on specific
data abstractions that include vectors, matrices, arrays, tensors, graphs,
and tables. Our key concepts are inspired by systems like MPI, HPF
(High-Performance Fortran), NumPy, Pandas, Spark, Modin, PyTorch, TensorFlow, RAPIDS(NVIDIA), and OneAPI
(Intel). Further, it is crucial to support different languages in everyday
use in the Big Data arena, including Python, R, C++, and Java. We note the
importance of Apache Arrow and Parquet for enabling language-agnostic high
performance and interoperability. We identify the fundamental principles of
operator-based architecture for data-intensive applications that are needed
for performance and usability success. We illustrate these principles by a
discussion of examples using our software environments, Cylon
and Twister2 that embody HPTMT. We describe the results of benchmarks that
are being developed by MLCommons (MLPerf).
Back to Session I
|
Deep Learning for Time Series
Geoffrey
Fox
School of Informatics, Computing and
Engineering, Department of Intelligent Systems Engineering; Digital Science Center and Data Science program
University of Indiana Bloomington, IN, USA
We show that one can study several sets of sequences
or time-series in terms of an underlying evolution operator which can be
learned with a deep learning network. We use the language of geospatial time
series as this is a common application type but the series can be any
sequence and the sequences can be in any collection (bag) - not just those in
Euclidean space-time -- as we just need sequences labeled
in some way and having properties dependent on this label (position in
abstract space). This problem has been successfully tackled by deep learning
in many ways and in many fields. Comparing deep learning for such time series
with coupled ordinary differential equations used to describe multi-particle
systems, motivates the introduction of an evolution operator that describes
the time dependence of complex systems. With an appropriate training process,
we interpret deep learning applied to spatial time series as a particular
approach to finding the time evolution operator for the complex system giving
rise to the spatial time series. Whimsically we view this training process as
determining hidden variables that represent the theory (as in Newton’s laws)
of the complex system. We apply these ideas to predicting Covid
infections and Earthquake occurrences.
Back to Session IV
|
An automated, self-service, multi-cloud
engineering simulation platform for a complex living heart simulation
workflow with ML
Wolfgang
Gentzsch
The UberCloud,
Germany and Sunnyvale, CA, USA
Co-authors: Daniel Gruber, Director of
Architecture at UberCloud; Yaghoub
Dabiri, Scientist at 3DT Holdings; Julius Guccione, Professor of Surgery at the UCSF Medical Center, San Francisco; and Ghassan
Kassab, President at California Medical Innovations
Institute, San Diego.
Many
companies are finding that replicating an existing on-premise
HPC architecture in the Cloud does not lead to the desired breakthrough
improvements. With this in mind, from day one, a fully automated,
self-service, and multi-cloud Engineering Simulation Platform has been
developed, resulting in highly increased productivity of the HPC engineers,
significantly improving IT security, reducing cloud costs and administrative
overhead to a minimum, and maintaining full control for engineers and
corporate IT over their HPC cloud environment and corporate assets.
This
platform has been implemented on Google Cloud Platform (GCP) for 3DT Holdings
for their highly complex Living Heart Project and Machine Learning, with the
final result of reducing simulation times from many hours per simulation to
just a few seconds of highly accurate prediction of an optimal medical device
placement during heart surgery.
The team
ran 1500 simulations needed to train the ML algorithm. The whole simulation
process took place as a multi-cloud approach, with all computations running
on 1500 HPC clusters in Google GCP, and management, monitoring, and
health-checks orchestrated from Azure Cloud and performed through SUSE’s
Kubernetes management platform Rancher.
Technology
used: UberCloud Engineering Simulation Platform,
multi-node HPC-enhanced Docker containers, Kubernetes, SUSE Rancher, Dassault Abaqus, Tensorflow, preemptible GCP
instances (c2_standard_60), managed Kubernetes clusters (GKE), Google Filestore, Terraform, and DCV remote visualization.
Back to Session VII
|
Dynamic Decentralized Workload Scheduling for
Cloud Computing
Vladimir
Getov
Distributed and Intelligent Systems Research
Group, School of Computer Science and Engineering, University of Westminster,
London, UNITED KINGDOM
Virtualized frameworks typically form the foundations
of Cloud systems, where Virtual Machine (VM) instances provide execution
environments for a diverse range of applications and services. Modern VMs
support Live Migration (LM) – a feature wherein a VM instance is transferred
to an alternative node dynamically without stopping its execution. This paper
presents a detailed design of a decentralized agent-based scheduler, which
can be used to manage workloads within the computing cells of a Cloud system
using Live Migration. Our proposed solution is based on the concept of
service allocation negotiation, whereby all system nodes communicate between
themselves, and the scheduling logic is decentralized. The presented
architecture has been implemented, with multiple simulation runs using
real-world workloads.
The focus of this research is to analyze
and evaluate the LM transfer cost which we define as the total size of data
to be transferred to another node for a particular migrated VM instance.
Several different virtualization approaches are categorized with a shortlist
of candidate VMs for evaluation. The paper highlights the major areas of the
LM transfer process – CPU registers, memory, permanent storage, and network
switching – and analyzes their impact on the volume
of information to be migrated which includes the VM instance with the
required libraries, the application code and any data associated with it.
Then, using several representative applications, we report experimental
results for the transfer cost of LM for respective VM instances. We also
introduce a novel Live Migration Data Transfer (LMDT) formula, which has been
experimentally validated and confirms the exponential nature of the LMDT
process. Our estimation model supports efficient design and development
decisions in the process of analyzing and building
modern Cloud systems based on dynamic decentralized workload scheduling.
Back to Session VII
|
Practical Quantum Computing
Victoria
Goliber
Senior Technical Analyst, D-Wave Systems
Inc., GERMANY
D-Wave's mission is to unlock the power of quantum
computing for the world. We do this by delivering customer value with
practical quantum applications for a diverse set of problems. Join us to
learn about the tools that D-Wave has available and how they are impacting business
around the world. We’ll conclude with a live demo showing how easy it is to
get started and build quantum applications today.
Back to Session V
|
AIOps as a future of Cloud Operations
Odej Kao
Distributed and Operating Systems Research
Group and Einstein Center Digital Future, Berlin
University of Technology, GERMANY
Artificial Intelligence for IT Operations (AIOps) combines big data and machine learning to replace
a broad range of IT operations tasks including availability, performance, and
monitoring of services. By exploiting log, tracing, metric, and network data,
AIOps aim at detecting service and system anomalies
before these turn into failures. This talk will present the developed methods
for automated anomaly detection, root cause analysis, for remediation,
optimization, and for automated initiation of self-stabilizing activities.
Extensive experimental measurements and Initial results show that AIOps platforms can help to reach the required level of
availability, reliability, dependability, and serviceability for future
settings, where latency and response times are of crucial importance. While
the automation is mandatory due to the system complexity and the criticality
of a QoS-bounded response, the measures compiled
and deployed by the AI-controlled administration are not easily understood or
reproducible. Therefore, explainable actions taken by the automated system is
becoming a regulatory requirement for future IT infrastructures. Finally, we
describe a developed and deployed system named logsight.ai in order to
provide an example for the design of the corresponding architecture, tools,
and methods.
CV Odej Kao
Odej Kao is full professor at Technische
Universität Berlin, head of the research group on distributed
and operating systems, chairman of the Einstein Center
Digital Future with 50 interdisciplinary professors, and chairman of the DFN
board. Moreover, he is the CIO of the university and principal investigator
in the national centers on Big Data and on
Foundations of Learning and Data. Dr. Kao is a
graduate from the TU Clausthal (master computer
science in 1995, PhD in 1997, habilitation in
2002). In 2002 Dr. Kao joined the Paderborn
University as associated professor for operating systems and director of the center for parallel computing. In 2006, he moved to
Berlin and focused his research on AIOps, big data
/ streaming analytics, cloud computing, and fault tolerance. He has published
over 350 papers in peer-reviewed proceedings and journals.
Back to Session VII
|
Building
the European EuroHPC Ecosystem
Kimmo Koski
CSC - Finnish IT Center
for Science, Espoo, Finland
LUMI is one of the
three pre-exaflop systems acquired by EuroHPC Joint Undertaking (JU), which after being fully
operational, will provide more than 500 PF of computing power for European
research and industry. The system, hosted by CSC, the Finnish IT Center for Science, and run by a consortium of 10
European countries, will install in two phases: first parts during the summer
of 2021 and rest at the end of 2021.
LUMI will be an
essential part of European HPC collaboration and one of the main platforms
for European research. It will fit together with other EuroHPC
sites, such as pre-exascale systems of Spain and
Italy, five petascale systems and future exaflop installations, all forming together the European
HPC ecosystem.
The talk introduces
LUMI and its role in European HPC Ecosystem, and discusses various aspects
motivating the architectural and functional choices when building an
international collaboration with heterogeneous resources placed in different
countries. Talk discusses the different needs and priorities of research,
which are driving the decisions aiming at the optimal performance for the
most challenging applications. Benefits obtained from research and industry,
are addressed.
The talk discusses
the eco-efficient and low carbon footprint operational environment and its
impact for the European Green Deal development. In addition, it analyzes the opportunities for developing the European
competitive advantage through intensive collaboration in building the
European EuroHPC Ecosystem.
Back to Session III
|
Exascale Programming Models for Heterogeneous Systems
Stefano Markidis
KTH Royal Institute of Technology, Computer
Science Department / Computational Science and Technology Division, Stokholm, SWEDEN
The first exascale
supercomputer is likely to be online any time soon. A production-quality
programming environment, probably based on existing dominant programming
interfaces such as MPI, needs to be in place to support application
deployment and development on the exascale
machines. The most striking characteristic of an exascale
supercomputer will be the amount of available parallelism required to achieve
the exaFLOPS barrier with the High-Performance
LINPACK benchmark. The first exascale machine will
provide programmers with between 100 million and a billion threads. The
second characteristic of an exascale supercomputer
will be the high level of heterogeneity of the compute and memory subsystems.
This fact drastically increases the number of FLOPS per Watt, making it
feasible to build an exascale machine using a power
budget in the order of 20-100 MW range. Low-power microprocessors,
accelerators, and reconfigurable hardware are the main design choice for an exascale machine. This heterogeneity in the compute will
also be accompanied by deeper memory hierarchies comprised of
high-performance and low-power memory technologies.
While it is not yet evident what would be the best
programming approach for developing applications on large-scale heterogeneous
supercomputers, a consensus in the HPC community is that programmers need an
extension of the dominant programming models to ensure the programmability of
new architectures. In this talk, I introduce the EPiGRAM-HS
project to address the heterogeneity challenge of programming exascale supercomputers. EPiGRAM-HS
improves their programmability, extending MPI and GASPI to exploit
accelerators, reconfigurable hardware, and heterogeneous memory systems. In
addition, EPiGRAM-HS takes MPI and GASPI at the
core of the software stack and extending the programmability and productivity
with additional software layers.
Back to Session III
|
Brain-like Machine Learning and HPC
Stefano Markidis
KTH Royal Institute of Technology, Computer
Science Department / Computational Science and Technology Division, Stokholm, SWEDEN
The modern deep learning methods based on
backpropagation have surged in popularity and have been used in multiple
domains and application areas. At the same time, there are other machine
learning algorithms inspired by modern models of brain neocortex functioning.
Unlike traditional deep learning, these models use a localized (and
unsupervised) brain-like rule to determine the neural network’s weights and
biases. The learning of the graph connection weights complies with Hebb’s
postulate: learning depends only on the available local information provided
by the activities of the pre- and post-synaptic units. A Hebbian
learning rule allows higher scalability and better utilization of HPC
systems. In this talk, I introduce brain-like machine learning and describe
the Bayesian Confidence Propagation (BCPNN) Neural Network, one of the most
established brain-inspired machine learning methods. I also discuss the
potential for these emerging methods to exploit HPC systems and present an
HPC BCPNN implementation, called StreamBrain, for
CPUs, GPUs, and FPGAs.
Back to Session IV
|
Data Analytics and AI on HPC Systems: About
the impact on Science
Wolfgang
Nagel
Center for Information Services and High Performance Computing, Technische Universitaet
Dresden, GERMANY
Methods and techniques of Artificial Intelligence
(AI) and Machine Learning (ML) have been investigated for decades in pursuit
of a vision where computers can mimic human intelligence. In recent years,
these methods have become more mature and, in some specialized applications,
evolved to super-human abilities, e.g. in image recognition or in games such
as Chess and Go. Nonetheless, formidable questions remain in the area of
fundamental algorithms, training data usage, or explainability
of results, to name a few. The AI – and especially the ML – developments have
been boosted by powerful HPC-Systems, mainly driven by the GPU architectures
built in in many if not most HPC systems these days. The talk will explain the challenges of
integrating AI and HPC into “monolithic” systems. And it will provide a broad
overview of what impact the availability of such systems will have on the
science system.
Back to Session I
|
Cloud Native Supercomputing
Gilad Shainer
NVIDIA, Menlo Park, CA, USA
High performance computing and Artificial
Intelligence are the most essential tools fueling
the advancement of science. In order to handle the ever growing demands for
higher computation performance and the increase in the complexity of research
problems, the world of scientific computing continues to re-innovate itself
in a fast pace. The session will review the recent development of the cloud
native supercomputing architecture, aiming on bringing together bare metal
performance and cloud services.
Back to Session II
|
Towards an Active Memory Architecture for
Time-Varying Graph-based Execution
Thomas
Sterling
School of Informatics, Computing and
Engineering and AI Computing Systems Laboratory, Indiana University,
Bloomington, IN, USA
A diversity of new GPUs and special purpose devices
are under development and in production for significant acceleration of a
wide range of Machine Learning and AI applications. For such problems
exhibiting high data reuse, these emerging platforms hold great promise in
commercial, medical, and defense domains. In
workflows heavily dependent upon irregular graph structures with rapidly
changing topologies defined by intra-graph meta-data like links, edges, and
arcs, a new generation of innovative memory-centric architectures are being
literally invented, some by entrepreneurs through new start-up companies.
Integration and tight coupling of memory with support logic has decades of
prior experiment. The new generation of architecture innovation is being
pursued to address such challenges as latency hiding, global naming, graph
processing idioms, and associated overheads for AI, ML, and AMR. Chief among
these is extreme scalability at the limitations of Moore’s Law and nanoscale
semiconductor fabrication technology. The Active Memory Architecture (AMA) is
one possible new class of graph-driven memory-centric architecture. The AMA
is under development, supported by NASA, to exploit opportunities exposed by
classic von Neumann architecture cores and advanced concepts for graph
processing. This address will present the innovative principles being
explored through the AMA and describe a prototype currently under testing.
All questions from the audience will be welcome throughout the presentation.
Brief Biography
Thomas Sterling is a Full Professor of Intelligent Systems Engineering
at Indiana University (IU) serving as Director of the AI Computing Systems
Laboratory at IU’s Luddy School of Informatics,
Computing, and Engineering. Since receiving his Ph.D
from MIT as a Hertz Fellow, Dr. Sterling has
engaged in applied research in parallel computing system structures,
semantics, and operation in industry, government labs, and academia. Dr. Sterling is best known as the "father of
Beowulf" for his pioneering research in commodity/Linux cluster
computing for which he shared the Gordon Bell Prize in 1997. His current
research is associated with innovative extreme scale computing through
memory-centric non von Neumann architecture concepts to accelerate dynamic
graph processing. In 2018, he
co-founded the new tech company, Simultac, and
serves as its President and Chief Scientist. Dr.
Sterling was the recipient of the 2013 Vanguard Award and is a Fellow of the
AAAS. He is the co-author of seven books and holds six patents. Most
recently, he co-authored the introductory textbook, “High Performance
Computing”, published by Morgan-Kaufmann in 2017.
Back to Session I
|
Parallel Runtime Systems for Dynamic Resource
Management and Task Scheduling
Thomas
Sterling
School of Informatics, Computing and
Engineering and AI Computing Systems Laboratory, Indiana University,
Bloomington, IN, USA
Runtime systems, principally through software, play
diverse roles in the management of resources expanding dynamic control and
filling perceived gaps between compilers and operating systems on the one
hand and hardware execution on the other. They can add workflow management of
distributed processing components or more fine-grained supervision for
optimality of efficiency and scaling through introspection. MPI, OpenMP and other user programming interfaces (e.g., Java,
Python, Lisp) incorporate some runtime functionality as does SLURM, Charm++
or Cilk++ to mention a few. Legion, Habanero and
HPX operate at the intra-application multi-thread level. Detailed experiments
with HPX-5 explored the potential advantages but also limitations of runtime
functionality and their sensitivity to application flow control properties.
This presentation describes and discusses the findings and conclusions of
this investigation demonstrating the potential improvements in some cases but
also areas in which they may prove a hindrance due to software overheads with
little or no gains. The talk concludes by considering future runtimes
optimized for different objective functions like memory bandwidth or latency
from the conventional ALU/FPU utilization. Exposed are possible targets for
hardware mechanisms for greater efficiency and scalability by reduction of
overhead times.
Back to Session III
|
Knowing your quantum computer: benchmarking, verification and
classical simulation at scale
Sergii Strelchuk
Department of Applied Mathematics and
Theoretical Physics and Centre for Quantum Information and Foundations,
University of Cambridge, United Kingdom
To ensure that a quantum device operates as
expected, we need to check its functioning on two levels. On a lower level,
we need to map all the noise sources and ensure they do not render our device
classical. On a higher level, we need to have a practical method to confirm
that the output to the computational problem produced by the quantum computer
can be trusted. In this talk, I will explain the core ideas behind these
tasks and discuss the unexpected role of classical simulability
which emerges in the above scenarios.
Back to Session V
|
Quantum computing for natural sciences and
machine learning applications
Francesco
Tacchino
Quantum Applications Researcher, IBM Quantum,
IBM Research – Zurich, Switzerland
The future of computing is being shaped today around
rapidly growing technologies, such as quantum and neuromorphic systems, in
combination with high performance classical architectures. In the coming
years, these innovative information processing paradigms may radically
transform and accelerate the mechanisms of scientific discovery, potentially
opening new avenues of research.
In particular, quantum computing could offer
scalable and efficient solutions for many classically intractable problems in
different domains including physics, chemistry, biology and medicine, as well
as optimisation, artificial intelligence and finance. In this talk, I will
review the state-of-the-art and recent progress in the field, both in terms
of hardware and software, and present some advanced applications, with a
focus on natural sciences, materials design and machine learning.
Back to Session V
|
Data-Centric Programming for Large-Scale
Parallel Systems -
The DCEx Model
Domenico
Talia
Department of Computer Engineering,
Electronics, and Systems and DtoK Lab,
University of Calabria, ITALY
For designing scalable parallel applications,
data-oriented programming models are effective solutions based on the
exploitation of local data structures and on the limitation of the amount of
shared data among parallel processes. This talk discusses the main features
and the programming mechanisms of the DCEx
programming model designed for the implementation of data-centric large-scale
parallel applications. The basic idea of the DCEx
model is structuring programs into data-parallel blocks to be managed by a
large number of parallel threads. Parallel blocks are the units of
distributed-memory parallel computation, communication, and migration in the
memory/storage hierarchy. Threads execute close to data using near-data
synchronization according to the PGAS model. A machine learning use case is
also discussed showing the DCEx features for Exascale programming.
Back to Session VII
|