HPC 2014
High Performance Computing
FROM clouds and BIG DATA to EXASCALE AND BEYOND
An
International Advanced Workshop
July
7 – 11, 2014,
Final Programme
Programme
Committee |
||||||
L.
GRANDINETTI (Chair) F.
BAETKE Hewlett Packard C.
CATLETT Argonne National Lab. and J.
DONGARRA S.
S. DOSANJH I.
FOSTER Argonne National Lab. and G.
FOX W.
GENTZSCH The UberCloud and
EUDAT V.
GETOV G.
JOUBERT C.
KESSELMAN E.
LAURE Royal T.
LIPPERT Juelich Supercomputing Centre M.
LIVNY I.
LLORENTE Universidad Complutense de Madrid B.
LUCAS S.
MATSUOKA Tokyo Institute of Technology P.
K.
MIURA National V.
PASCUCCI N.
PETKOV J.
QIU S.
SEKIGUCHI National T.
A.
WANG The |
JAPAN U.S.A. HONG KONG |
|||||
Co-Organizers |
||||||
L. GRANDINETTI |
T. LIPPERT Institute
for Advanced Simulation, Juelich Supercomputing |
|||||
Organizing
Committee |
||||||
L. GRANDINETTI (Co-Chair) |
T. LIPPERT (Co-Chair) |
|||||
M. ALBAALI ( |
C. CATLETT ( |
J. DONGARRA (U.S.A.) |
W. GENTZSCH ( |
O. PISACANE ( |
M. SHEIKHALISHAHI ( |
|
Sponsors
ARM |
|
CRAY |
|
DIMES - Department of
Computer Engineering, Electronics, and Systems |
|
Dipartimento di Ingegneria dell’Innovazione Università del Salento |
|
Hewlett Packard |
|
IBM |
|
INTEL |
|
Juelich Supercomputing Center |
|
KISTI - |
|
MELLANOX TECHNOLOGIES |
|
MICRON |
|
National Research Council
of |
|
PARTEC |
|
Media Partners
HPCwire is the leader in world-class journalism for HPC.
With a legacy dating back to 1986, HPCwire is
recognized worldwide for its breakthrough coverage of the fastest computers
in the world and the people who run them. For topics ranging from the latest
trends and emerging technologies, to expert commentary, in-depth analysis,
and original feature coverage, HPCwire delivers it
all, as the industry’s leading news authority and most reliable and trusted
resource. Visit HPCwire.com and subscribe
today! |
Free Amazon web Service credits for all
HPC 2014 delegates Amazon is very pleased to be able to
provide $200 in service credits to all HPC 2014 delegates. Amazon Web
Services provides a collection of scalable high performance and
data-intensive computing services, storage, connectivity, and integration
tools. AWS allows you to increase the speed of research and to reduce costs
by providing Cluster Compute or Cluster GPU servers on-demand. You have access to a full-bisection, high
bandwidth 10Gbps network for tightly-coupled, IO-intensive workloads, which
enables you to scale out across thousands of cores for throughput-oriented
applications. |
The UberCloud
is an online community and marketplace platform for engineers and scientists
to discover, try, and buy computing time, on demand, in the HPC Cloud, and
pay only for what you use. Please register for the UberCloud Voice Newsletter, or for performing an HPC
Experiment in the Cloud. |
Speakers Katrin Amunts Structural
and Functional Organization of the Brain (INM 1) at Forschungszentrum
Jülich Juelich and Structural-functional
brain mapping at Frank Baetke Global
HPC Programs Academia
and Scientific Research Hewlett Packard Palo Alto, CA USA
Pete Beckman Director, Exascale Technology and Computing Institute Argonne National Laboratory Argonne, IL USA
Keren Bergman Department
of Electrical Engineering William Blake Senior VP
and CTO CRAY Charlie Catlett Math
& Computer Science Div. and Computation
Institute of The Alok Choudhary Northwestern University Paul Coteus IBM
Research - Data
Centric Deep Computing Systems Patrick Demichel Strategic
System Architect in HPC Hewlett Packard Palo Alto, CA USA Jack Dongarra Innovative Computing Laboratory and Mikhail Dorojevets Stony
Brook University Dept. of
Electrical and Computer Engineering Stony Brook, NY USA Sudip S. Dosanjh Director
of the National Energy Research Scientific Computing Center at USA
Paul F. Fischer Argonne National Laboratory Argonne, IL USA
Ian Foster Argonne National Laboratory and Dept of
Computer Science The Geoffrey Fox Community
Grid Computing Laboratory Wolfgang Gentzsch The UberCloud and EUDAT GERMANY Sergei Gorlatch Universitaet Muenster Institut für
Informatik GERMANY Richard Graham Mellanox Sunnyvale, CA USA Bart Ter Haar Romeny Eindhoven
University of Technology Department
of Biomedical Engineering Biomedical
Image Analysis & Interpretation THE Takahiro Hirofuchi Information
Technology Research Institute National
Institute of Advanced Industrial Science and Technology (AIST) Gerhard Joubert Carl Kesselman Information
Sciences Institute Marina del Rey,
Los Angeles, CA Ludek Kucera Faculty
of Mathematics and Physics Marcel Kunze Forschungsgruppe Cloud Computing Erwin Laure KTH Royal
Institute of Technology John D. Leidel Dallas/Forth
Worth, Thomas Lippert Institute
for Advanced Simulation Jülich Supercomputing
Centre and and John von
Neumann Institute for Computing (NIC) also European
PRACE IP Projects and of the DEEP Exascale Project Guy Lonsdale Vorstand/CEO scapos AG Sankt
Augustin Bob Lucas Computational
Sciences Division Information
Sciences Institute Stefano Markidis
KTH Royal
Institute of Technology SWEDEN Victor Martin-Mayor Departamento de
Fisica Teorica Universidad
Complutense de Madrid Madrid SPAIN Satoshi Matsuoka Global
Scientific Information and Computing Center &
Department of Mathematical and Computing Sciences Tokyo
Institute of Technology Paul Messina Argonne National Laboratory Argonne, IL Ken Miura Center for
Grid Research and Development National Mark Moraes Head
Engineering Group D.E. Shaw
Research Valerio Pascucci Center for
Extreme Data Management, Analysis and Visualization, Scientific
Computing and Imaging Institute School of
Computing and David Pellerin AWS High
Performance Computing AMAZON Dana Petcu Computer
Science Department Judy Qiu and Pervasive
Technology Institute USA Mark Seager CTO for HPC
Systems INTEL Santa Clara, California USA
Alex Shafarenko Hatfield John Shalf Thomas Sterling and Rick Stevens and Department
of Computer Science, The Argonne
& Chicago Domenico Talia Department
of Computer Engineering, Electronics, and Systems William M. Tang Dept. of
Astrophysical Sciences, Plasma Physics Section Fusion
Simulation Program and Princeton
Institute for Computational Science and Engineering Princeton USA Matthias Troyer Institut für Theoretische
Physik ETH Zürich SWITZERLAND Eric Van Hensbergen ARM
Research Priya Vashishta Collaboratory for Advanced Computing and Simulations Departments
of Chemical Engineering & Materials Science, Physics & Astronomy, and
Computer Science Jose Luis Vazquez-Poletti Distributed
Systems Architecture Research Group (DSA-Research.org) Universidad
Complutense de Madrid SPAIN Vladimir Voevodin Research
Computing Center Robert Wisniewski Chief
Software Architect Exascale Computing INTEL
Corporation USA |
Workshop
Agenda
Monday,
July 7th
Session |
Time |
Speaker/Activity |
|
9:00 – 9:15 |
Welcome
Address |
|
State of the Art and Future Scenarios |
|
|
9:15 – 9:50 |
J.
DONGARRA |
|
9:50 – 10:25 |
I.
FOSTER |
|
|
G.
FOX Returning
to Java Grande: High Performance Architecture for Big Data |
|
11:00 – 11:30 |
COFFEE BREAK |
|
|
S.
Matsuoka |
|
|
R.
Stevens Future Scenarios — |
|
12:40 – 13:00 |
CONCLUDING REMARKS |
|
Emerging Computer Systems and Solutions |
|
|
|
F.
Baetke |
|
|
B.
Blake |
|
|
P.
Coteus |
|
18:15 – 18:45 |
COFFEE BREAK |
|
|
J.
Leidel |
|
|
D.
Pellerin Scalability
in the Cloud: HPC Convergence with Big Data in Design, Engineering,
Manufacturing |
|
|
M.
Kunze |
|
20:00 – 20:10 |
CONCLUDING REMARKS |
Tuesday, July 8th
Session |
Time |
Speaker/Activity |
|
Advances in HPC Technology and Systems |
|
|
|
S.
Gorlatch |
|
|
A.
Shafarenko Coordination programming for self-tuning: the challenge of
a heterogeneous open environment |
|
|
K.
Miura Prospects
for the Monte Carlo Methods in the Million Processor-core Era and Beyond |
|
|
B.
Lucas |
|
|
V.
Martin-Mayor Quantum versus Thermal annealing (or D-wave versus
Janus): seeking a fair comparison |
|
11:05 – 11:35 |
COFFEE BREAK |
|
Software and
Architecture for Extreme Scale Computing I |
|
|
11:35 – 12:00 |
S.
Dosanjh |
|
|
E.
Laure |
|
|
M.
Seager Beowulf
meets Exascale System Software: A horizontally
integrated framework |
|
12:50 – 13:15 |
B.
Lucas |
|
Software and Architecture for Extreme Scale
Computing II |
|
|
|
P.
Beckman t.b.a. |
|
|
J.
Shalf Exascale Programming Challenges: Adjusting to the new
normal for computer architecture |
|
|
L.
Kucera |
|
18:15 – 18:45 |
COFFEE BREAK |
|
Brain related Simulation and Computing |
|
|
18:45 – 19:10 |
K.
Amunts Ultra-high
resolution models of the human brain – computational and neuroscientific
challenges |
|
19:10 – 19:35 |
T.
Lippert |
|
19:35 – 20:00 |
B.
ter Haar Romeny Functional
models for early vision circuits from first principles |
|
20:00 – 20:10 |
CONCLUDING REMARKS |
Wednesday, July 9th
Session |
Time |
Speaker/Activity |
|
Beyond Exascale
Computing |
|
|
9:00 – 9:15 |
P.
Messina Enabling technologies for
beyond exascale computing |
|
9:15 – 9:45 |
R.
Stevens Beyond Exascale
— What will Sustain our Quest for Performance in a Post-Moore World? |
|
9:45 – 10:15 |
M.
Dorojevets Energy-Efficient Superconductor Circuits for
High-Performance Computing |
|
10:15 – 10:45 |
M.
TROYER T.B.A. |
|
10:45 – 11:15 |
COFFEE BREAK |
|
11:15 – 11:45 |
M.
Moraes |
|
11:45 – 12:15 |
P.
Demichel New
technologies that disrupt our complete ecosystem and their limits in the race
to Zettascale |
|
12:15 – 12:45 |
K.
Bergman Scalable
Computing Systems with Optically Enabled Data Movement |
|
12:45 – 13:00 |
CONCLUDING REMARKS |
|
17:00 – 17:30 |
R.
Wisniewski |
|
17:30 – 18:00 |
COFFEE
BREAK |
|
18:00 – 18:30 |
T. |
|
18:30 – 20:00 |
PANEL DISCUSSION: “Beyond Exascale
Computing” Organized and Chaired by P.
Messina Participants: F.Baetke (Hewlett Packard), P. Coteus
(IBM), R. Graham (Mellanox), G. Fox ( |
Thursday, July 10th
Session |
Time |
Speaker/Activity |
|
Cloud Computing
Technology and Systems |
|
|
9:00 – 9:30 |
J.
Qiu |
|
9:30 – 10:00 |
D.
Petcu |
|
10:00 – 10:30 |
T.
Hirofuchi AIST Super Green Cloud: A build-once-run-everywhere high
performance computing platform |
|
10:30 – 11:00 |
D.
Talia |
|
11:00 – 11:30 |
COFFEE BREAK |
|
11:30 – 12:00 |
G.
Lonsdale The
Fortissimo HPC-Cloud: an enabler for engineering and manufacturing SMEs |
|
12:00 – 12:30 |
J.
L. Vazquez |
|
12:30 – 13:00 |
W.
Gentzsch |
|
13:00 – 13.10 |
CONCLUDING REMARKS |
|
Big Data |
|
|
17:00 – 17:25 |
V.
Pascucci |
|
17:25 – 17:50 |
A.
Choudary BIG DATA + BIG COMPUTE = Power of Two for Scientific
Discoveries |
|
17:50 – 18:15 |
G.
Fox |
|
18:15 – 18:45 |
COFFEE BREAK |
|
18:45 – 19:10 |
G.
Joubert |
|
19:10 – 19:35 |
E.
Van Hensbergen From Sensors to Supercomputers, Big Data Begins With
Little Data |
|
19:35– 20:00 |
C.
Kesselman A Software as a Services based approach to Digital Asset
Management for Complex Big-Data |
|
|
CONCLUDING REMARKS |
Friday, July 11th
Session |
Time |
Speaker/Activity |
|
Infrastructures,
Solutions and Challenging applications of HPC, Grids
and Clouds |
|
|
9:00 – 9:30 |
C.
Catlett New Opportunities for
Computation and Big Data in Urban Sciences |
|
9:30 – 10:00 |
R.
GRAHAM |
|
10:00 – 10:30 |
S.
MARKIDIS Challenges and Roadmap for Scientific Applications at Exascale |
|
10:30 – 11:00 |
W.
Tang Extreme
Scale Computing Advances & Challenges in PIC Simulations |
|
11:00 – 11:30 |
COFFEE BREAK |
|
11:30 – 12:00 |
P.
Vashishta |
|
12:00 – 12:30 |
P.
Fischer |
|
12:30 – 13:00 |
V.
Voevodin Medical
practice: diagnostics, treatment and surgery in supercomputer centers |
|
13:00 – 13.10 |
CONCLUDING REMARKS |
Chairmen
Paul
Gerhard Joubert
Jack Dongarra
Innovative Computing
Laboratory
and
Bill Blake
Cray Inc.
Sudip Dosanjh
Nicolai Petkov
University of Groningen
Groningen
THE
Paul
Geoffrey Fox
Bob Lucas
Computational Sciences
Division
Information Sciences Institute
Wolfgang Gentzsch
The UberCloud
and EUDAT
formerly
SUN Microsystems and
Panel
Discussion
Paul
Messina (Chair) Participants:
F. Baetke (Hewlett Packard), P. Coteus
(IBM), R. Graham (Mellanox), G. Fox (Indiana University), T. Lippert (Juelich Supercomputing
Centre), S. Matsuoka (Tokyo Institute of Technology), P. Shalf
(Lawrence Berkeley Laboratory), V. Voevodin (Moscow
State University) This panel session will be held at the end of a day
dedicated to presentations on the topic “Beyond Exascale
Computing.” Beyond exascale systems – as we are
defining them – are ones that will be based on new technologies that will
finally result in the much anticipated (but unknown) phase-change to truly
new paradigms/methodologies. The presentations prior to the panel will have
covered promising disruptive technologies and architecture advances that may
be enabled as a consequence of technology progress. The focus of this panelisto providea forum for
views onforward-looking technologies that may
determine future operational opportunities and challenges for computer
systems beyond the exascale regime, what impact
they will have on computer architectures and the entire computing ecosystem,
and what applications they might enable. |
Abstracts
Ultra-high resolution models of the human
brain – computational and neuroscientific
challenges Katrin Amunts Jülich Research The human brain is characterized by a multi level
organization. Brain models at microscopical
resolution provide the bridge between the cellular level of organization and
that of cognitive systems. Data size and complexity of brain organization,
however, make it challenging to create them. Cytoarchitectonic
mapping strategies, as well as 3D-Polarized Light Imaging for analysing nerve
fibre bundles and single axons will be discussed. Models of cellular and fiber architecture will be shown, including a new BigBrain data set, based on advanced ICT, thus opening
new perspectives to decode the human brain. |
Trends and Paradigm Shifts in High
Performance Computing Frank Baetke HP Global HPC HP’s HPC
product portfolio, based on standards at the processor, node and interconnect
levels, has conquered the High Performance Computing market across all
industry and application segments. HP’s extended portfolio of compute,
storage and interconnect components powers most HPC sites in the TOP500 list.
For specific challenges at node and systems level HP has introduced the
SL-series with proven Petascale scalability and
leading energy efficiency. The
SL-series will continue to lead innovation with the recent addition of new
GPU- and coprocessor architectures as well as advanced storage subsystems. Power and
cooling efficiency has become a new key focus area. Primarily only an issue
of cost, it now extends for the power and thermal density of what can be
managed in a data center. Combining all the
associated technological advances will result in a new HPC paradigm shifts
towards data centers not only running at extreme
efficiencies but also enabling extended energy recovery rates. Moving
towards Exascale computing we will be facing
additional challenges that again will have a significant impact on how large
scale systems have to be designed. |
Scalable Computing Systems with Optically
Enabled Data Movement Keren Bergman Department of Electrical Engineering, As future computing systems aim to realize Exascale performance the challenge of energy efficient
data movement rather than computation is paramount. Silicon photonics has
emerged as perhaps the most promising technology to address these challenges
by providing ultra-high bandwidth density communication capabilities that is
essentially distance independent. Recent advances in chip-scale silicon
photonic technologies have created the potential for developing optical
interconnection networks that offer highly energy efficient communications
and significantly improve computing performance-per-Watt. This talk will
explore the design of silicon photonic interconnected architectures for Exascale and their impact on the system level
performance. |
The Fusion of Supercomputing and Big Data:
The Role of Global Memory Architectures in Future Large Scale Data Analytics Bill
Blake Senior VP and CTO, High Performance Computing is gaining the
capabilities needed to deliver exascale supercomputers
that can deliver the billion-way parallel computing and extreme memory
capabilities needed. But at the same time, Big Data approaches to large scale
analytics are pursuing another path leading to millions of servers and
billions of cores in the cloud that deliver results with advanced distributed
computing, This paper will explore the technology and architectural trends
facing system and application developers and speculate on a future where the
most powerful large scale analytics needs wll be
met highly integrated Global Memory Architectures. |
BIG DATA + BIG COMPUTE = Power of Two for
Scientific Discoveries Alok N. Choudhary Henry & Isabelle Dever
Professor of EECS, Northwestern Knowledge discovery has been driven by theory,
experiments and by large-scale simulations on high-performance computers.
Modern experiments and simulations involving satellites, telescopes,
high-throughput instruments, sensors, and supercomputers yield massive
amounts of data. What has changed recently is that the world is creating
massive amounts of data at an astonishing pace and diversity. Processing, mining and analyzing this data
effectively and efficiently will be a critical component as we can no longer
rely upon traditional ways of dealing with the data due to its scale and
speed. But there is a social aspect of acceleration, which is sharing of “big
data” and unleashing thousands to ask questions and participate in discovery.
This talk addresses the fundamental question "what are the challenges
and opportunities for extreme scale systems to be an effective platform"
for not only traditional simulations, but their suitability for
data-intensive and data driven computing to accelerate time to insights. Biography: Alok Choudhary is theHenry &
Isabelle Dever Professor of Electrical Engineering
and Computer Science and aprofessor at Kellogg
School of Management. He is also the founder, chairman and chief scientist
(served as its CEO during 2011-2013) of |
Paul Coteus IBM Fellow, Chief Engineer Data Centric Deep
Computing Systems Manager, IBM Research Systems Power,
Packaging, and Cooling, I will explain the motivation for IBMs Data Centric
Systems, and how it connects the needs of big data and high performance
computing. Data Centric Systems are built on the principle that moving
computing to the data will lead to more cost effective, efficient, and easier
to program systems than in the past. I will explain our vision for Data Centric
Computing, covering hardware, software and programming models. |
New technologies that disrupt our complete
ecosystem and their limits in the race to Zettascale Patrick Demichel Strategic System Architect in HPC, We now have a clear path toward the goal of Exaflop computing by 2020 for a power budget of
~20MWatts. There are still many components to move from Labs to
Development, but now is time to start the journey toward the Zettaflop to discover the future challenges that we will
face and that will require even more creativity and endurance to solve. A Zettaflop system opens
the door to solve many of the most fundamental problems that our societies
face and create opportunities in particular in climatology, energy,
biosciences, security, Big Data, … Clearly power and economic constraints will
continuously and exponentially be the key drivers which influence almost all
of our choices. In the era of the Internet Of Things; with potentially
trillions of connected objects and Yottabytes of
data, we could bring thousands of fundamental breakthroughs in all domains if
we know how to extract meaning from the tsunami of information. We need these
Zettascale systems, as they will be the brain,
central to this highly engineered planet. |
High Performance Computing Today and
Benchmark the Future Jack Dongarra Innovative Computing Laboratory, & In this talk we examine how high performance
computing has changed over the last 10-year and look toward the future in
terms of trends. These changes have had and will continue to have a major
impact on our numerical scientific software. In addition benchmarking in
particular the Linpack Benchmark must change to
match the high performance computing today and in the next generation. |
Energy-Efficient Superconductor Circuits for
High-Performance Computing Mikhail Dorojevets Stony Brook University, Dept. of Electrical
and Computer Superconductor technology offers an opportunity to
build processing circuits operating at very high frequencies of 20-50 GHz
with ultra-low power consumption. The first generation of such circuits used
a logic called Rapid Single-Flux-Quantum (RSFQ) to demonstrate ultra-high
clock frequencies. However, RSFQ circuits have significant static power
consumption in so-called bias resistors (~ 100X dynamic power consumption in
Josephson junctions). Recently, the invention of new energy-efficient SFQ
logics with practically zero static power dissipation allowed superconductor
designers to switch their focus from high frequencies to energy efficiency.
First, I will talk about our design methodology, fabrication and
demonstration of several wave-pipelined RSFQ processing units operating at 20
GHz and even higher frequencies. Then, I’ll discuss our recent work on
design, evaluation and projections for a new generation of energy-efficient
superconductor circuits using a benchmark set of 32-/64-bit integer and
floating-point units, register files, and other local storage structures
designed for some new superconductor fabrication process to be developed by
2016-2017. Acknowledgment: The research was funded in part by ARO contract
W911NF-10-1-0012. |
Big Computing, Big Data, Big Science Sudip Dosanjh Director, National Energy Research Scientific
Computing (NERSC) Center at With more than 5,000 users from universities,
national laboratories and industry, the National Energy Research Scientific
Computing (NERSC) Center supports the largest and
most diverse research community of any computing facility within the U.S.
Department of Energy (DOE). We provide large-scale, state-of-the-art
computing for DOE’S unclassified research programs in alternative energy
sources, climate change, energy efficiency, environmental science and other
fundamental science areas. NERSC recently installed our newest supercomputing
platform, a Cray XC30 system named “ NERSC’s primary mission is to accelerate scientific
discovery at the DOE’s Office of Science through high performance computing
and data analysis. In 2013 we provided 1.25 billion computer hours to our
users, and 2013 proved to be a productive year for scientific discovery at
the center. In 2013, our users published 1,977
refereed papers and 18 journal cover stories based on the computations
performed at NERSC. In addition, long-time NERSC user Martin Karplus, who has been computing at NERSC since 1998, was honored with a Nobel Prize in Chemistry for his
contributions to the field of computational chemistry. A clear trend at NERSC is that a growing number of
scientific discoveries involve the analysis of extremely large data sets from
experimental facilities. For the last four years, more data has been
transferred to NERSC than away from NERSC, representing a paradigm shift for
a supercomputing center. Most months we ingest more
than a Petabyte of data. A few recent
data-intensive highlights are: • Computing
the properties of neutrinos from the Daya Bay
Neutrino Experiment led to the discovery of a new type of neutrino
oscillation which may help solve the riddle of matter-antimatter asymmetry in
the universe (one of Science Magazine’s Top 10 Breakthroughs of 2012). • NERSC’s
integrated resources and services enabled the earliest-ever discovery of a
supernova—within hours of its explosion—providing new information about
supernova explosion dynamics. • The IceCube South Pole Neutrino Observatory made the first
observations of high-energy cosmic neutrinos, an achievement enabled in part
by NERSC resources (This was Physics World's “Breakthrough of the Year” in
2013). • Data
analyzed from the European Space Agency's Planck space telescope revealed new
information about the age and composition of the universe (one of Physics
World’s “Top 10 Breakthroughs of the Year”). • The
South Pole Telescope made the first detection of a subtle twist in light from
the Cosmic Microwave Background. • The
Materials Project, one of NERSC’s most popular Science Gateways, was featured
as a “world changing idea” in a November 2013 Scientific American cover
story, “How Supercomputers Will Yield a Golden Age of Materials Science.” The demands for larger and more detailed
simulations, massive numbers of simulations, and the explosion in the size
and number of experimental data sets mean the there is no end in sight to the
need for NERSC resources.. This talk will describe NERSC’s strategy for
bringing together big computing and big data in the next decade to achieve
big science. |
Scalable Simulations of Multiscale
Physics Paul F.
Fischer Mathematics and Computer Science Division,
Argonne National Current high-performance computing platforms feature
million-way parallelism and it is anticipated that exascale
computers will feature billion-way concurrency. This talk explores the
potential of computing at these scales with a primary focus on fluid flow and
heat transfer in application areas that include nuclear energy, combustion,
oceanography, vascular flow, and astrophysics. Following Kreiss
and Oliger (72), we argue that high-order methods
are essential for efficient simulation of transport phenomena at petascale and beyond.
We demonstrate that these methods can be realized at costs equivalent
to those of low-order methods having the same number of gridpoints. We further show that, with care, efficient
multilevel solvers having bounded iteration counts will scale to billion-way
concurrency. Using data from leading-edge platforms over the past
25 years, we analyze the scalability of (low- or high-order) domain
decomposition approaches to predict parallel performance on exascale architectures.
The analysis sheds light on the expected scope of exascale
physics simulations and provides insight to design requirements for future
algorithms, codes, and architectures. |
Ian
Foster Computation Institute Argonne National
Laboratory and Dept. of Computer Science, The University of The US Materials Genome Initiative seeks to develop
an infrastructure that will accelerate advanced materials development and
deployment. The term Materials Genome suggests a science that is
fundamentally driven by the systematic capture of large quantities of
elemental data. In practice, we know, things are more complex—in materials as
in biology. Nevertheless, the ability to locate and reuse data is often
essential to research progress. I discuss here three aspects of networking
materials data: data publication and discovery; linking instruments,
computations, and people to enable new research modalities based on
near-real-time processing; and organizing data generation, transformation,
and analysis software to facilitate understanding and reuse. |
Returning to Java Grande: High Performance
Architecture for Big Data Geoffrey
Fox Community Grid Computing Laboratory, Here we use a sample of over 50 big data applications
to identify characteristics of data intensive applications and to deduce
needed runtime and architectures. We propose a big data version of the famous
|
Geoffrey
Fox Community Grid Computing Laboratory, We discuss a variety of large scale
optimization/data analytics including deep learning, clustering, image
processing, information retrieval, collaborative filtering and dimension
reduction. We describe parallelization challenges and nature of kernel
operations. We cover both batch and streaming operations and give some
measured performance on both MPI and MapReduce
frameworks. |
UberCloud - from Project to Product Wolfgang
Gentzsch The UberCloud and
EUDAT, On Thursday June 28, 2012, during the last HPC
Workshop in Cetraro, the UberCloud
HPC Experiment has been announced at http://www.hpcwire.com/2012/06/28/the_uber-cloud_experiment/. The day before, during breakfast on the beautiful
terrace of Hotel San Michele, Tom Tabor and Wolfgang Gentzsch
crafted the announcement, and Geoffrey Fox was the first to register for an
HPC Experiment. Since then, over 1500 organizations and individuals
registered and participated in 148 experiments exploring HPC, CAE, Bio,
Finance, and Big Data in the Cloud. Compendium 1 and 2 appeared with more
than 40 case studies reporting about benefits, challenges, and lessons
learned from porting and running engineering and scientific applications in
the Cloud. And since then the UberCloud
online community and marketplace platform has been founded; and more in
preparation. This presentation provides a status of the UberCloud Experiment, and the online community and
marketplace platform, discusses challenges and lessons learned, and presents
several case studies. |
Towards High-Level Programming for Many-Cores Sergei
Gorlatch Universitaet Muenster, Institut
für Informatik, Germany Application development for modern high-performance
systems with many cores, i.e., comprising multiple Graphics Processing Units
(GPUs) and multi-core CPUs, currently exploits low-level programming
approaches like CUDA and OpenCL, which leads to
complex, lengthy and error-prone programs. In this paper, we advocate a high-level programming
approach for such systems, which relies on the following two main principles: a) the model is based on the current OpenCL standard, such that programs remain portable
across various many-core systems, independently of the vendor, and all
low-level code optimizations can be applied; b) the model extends OpenCL
with three high-level features which simplify many-core programming and are
automatically translated by the system into OpenCL
code. The high-level features of our programming model are
as follows: 1) memory management is simplified and automated using
parallel container data types (vectors and matrices); 2) an automatic data (re)distribution mechanism
allows for implicit data movements between GPUs and ensures scalability on
multiple GPUs; 3) computations are conveniently expressed using
parallel algorithmic patterns (skeletons). The well-defined skeletons allow for formal
transformations of SkelCL programs which can be
used both in the process of program development and in the compilation and
optimization phase. We demonstrate how our programming model and its
implementation are used to express parallel applications on one- and
two-dimensional data, and we report first experimental results to evaluate
our approach in terms of programming effort and program size, as well as
target performance. |
Richard
Graham Mellanox, Exascale levels of
computing pose many system- and application- level computational
challenges. Mellanox
Technologies, Inc. as a provider of end-to-end communication services is
progressing the foundation of the InfiniBand
architecture to meet the exascale challenges. This presentation will focus on recent
technology improvements which significantly improve InfiniBand’s
scalability, performance, and ease of use. |
Functional models for early vision circuits
from first principles Bart M. ter Haar Romeny Dept. of Biomedical Engineering – Biomedical
Image Analysis There are many approaches to model functional
understanding of early vision mechanisms, by (large-scale) numerical
simulations, by neuro-computational mathematical modeling, by plasticity learning rules, pattern
recognition paradigms, among others. This presentation will focus on geometrical models
for the visual front-end: the lowest level (V1) is considered as a geometry
inference engine, with its extensive filterbanks,
with a gradual increase in functional complexity to higher level operations
up to V4 (Azzopardi, Petkov)
with non-local topological models. It is of interest to study the emergence and
presence of known receptive fields and their interactions with a first
principled(axiomatic) approach.We will discuss in
detail how the optimal aperture can be modeled as a
Gaussian, from a minimum entropy requirement, and how the diffusion equation,
first discussed by Koenderink,emerges.This PDE givesrise to a new model for the center-surround
receptive fields in the retina as filters that only signal back as being
interesting those locations that respond to local variations in receptive
field size; We discuss how Gaussian derivative kernels may emerge, from a PCA
analysis of eigen patches of an image, generating
robust differential operators both for invariant shape detection and the
measurement of color differential structure. We discuss in detail the optimal regularization
properties of the Gaussian multi-scale differential operator receptive
fields, as an instant of Tichonov regularization.Making the requirements locally adaptive
leads to non-linear diffusion PDE’s, inspired by the strong cortico-thalamic feedback. The famous pinwheel structure may be modeled as stacks of multi-orientation filtered outputs,
so-called orientation scores. Assuming invertibility
as first principle, a new robust family of wavelets is found to generate
these scores uniquely, which are similar to but not the same as Gabor
kernels. These new ‘spaces’ give ample opportunities for more contextual
image analysis operations, like denoising branching
vessel patterns, and enhancement and analysis of multiply crossing brain
tracts from diffusion tensor imaging. Short bio: Prof. Bart M. terHaarRomeny
is professor at Eindhoven University of Technology in the He is co-appointed professor at He is President of the Dutch Society for Pattern
Recognition and Image Processing, and has been President of the Dutch Society
for Biophysics & Biomedical Engineering (1998 – 2002) and the Dutch
Society of Clinical Physics (NVKF, 1990-1992). He initiated the ‘Scale-Space’ conference series in
1997 (now SSVM). He is reviewer for many journals and conferences, and
organized several Summer Schools. He is an awarded teacher, and a frequent
keynote lecturer. Prof. Romeny is Senior Member of
IEEE, Board member of IAPR, registered Clinical Physicist of NVKF, and
partner in the Chinese Brainnetome consortium. |
AIST Super Green Cloud: A
build-once-run-everywhere high performance computing platform Dr. Takahiro Hirofuchi Senior Researcher, Information Technology Research Institute, National Institute of Advanced Industrial Science and Technology
(AIST), AIST Super Green Cloud (ASGC) is a high performance computing
Cloud platform built on a supercomputer at National Institute of Advanced
Industrial Science and Technology (AIST). It aims at providing users with
fully-customizable and highly-scalable high performance computing
environments by taking advantages of advanced virtualization technologies and
resource management mechanisms. Thanks to virtual machine technologies, users
obtain their own virtualized supercomputers on physical resource in a
build-once-run-everywhere manner; a virtualized supercomputer is portable and
able to scale out to other commercial and academic cloud services. To
overcome performance overhead incurred by hypervisors, we have developed
hypervisor-bypass I/O technologies and integrated them into the Apache Cloud
Stack. Although user environments are fully virtualized, performance
degradation from bare-metal environments is negligible. From the viewpoint of the administrative side, ASGC
achieves energy-efficient and flexible management of physical resource by
means of dynamic server placement. We have developed advanced migration
technologies that efficiently work for HPC and enterprise virtual machines
involving intensive memory and network I/O. It also enables us to migrate VMs
among geographically-distant HPC clouds. In this talk, I will present the
overview of the ASGC project and our first experience obtained since
launching a service in June 2014. |
Gerhard Joubert The general concept of the scientific method or
procedure consists in systematic observation, experiment and measurement, and
the formulation, testing and modification of hypotheses. The method applies
in practice to the solution of any real world problem. The order in which the
two distinct components of the scientific method are applied in practice
depends on the nature of the particular problem considered. In many cases a hypothesis is formulated in the form
of a model, for example a mathematical or simulation model. The correctness
of a solution of the problem produced by this model is then verified by
comparing it with collected data. Alternatively, observational data are collected
without a clear specification that the data could apply to the solution of
one or more particular problems. This is, for example, often the case with
medical data. In such cases data analytics are used to extract relationships
from and detect structures in the (large) data sets. These can then be used
to formulate one or more hypotheses, i.e. models, which lead to a deeper
insight in the problem(s) considered. Since the advent of the wide spread interest in
so-called Big Data, there is a growing tendency to consider the results
obtained through the analysis of large data sets in their own right,
supplying satisfactory solutions to particular problems without the need for
hypotheses and models. A notion is thus developing that the scientific method
is becoming obsolete in the case of Big Data. In this the fact that a deeper
understanding of the problem(s) considered may lead to different and more
accurate solutions, is ignored. In this talk the relationship between data and
models are shortly outlined and illustrated in the case of a common problem.
The limitations of results obtained with data analyses without gaining the
insight resulting from appropriate models, which is fundamental to the
scientific method, is exemplified. Considering Big Data in the context of the
scientific method one can state that the objective should be to gain
“Insight, not Data”. |
A Software as A Services based approach to
Digital Asset Management for Complex Big-Data Carl Kesselman Trends in big-data are resulting in large, complex
data sets are delivered in a wide variety of forms from
diverse instruments, locations and sources. Simply keeping track of and
managing the deluge of data (i.e. data wrangling)
can be overwhelming let alone integrating data into ongoing scientific
activities, thus imposing overheads that often slow or even impede producing
research results (1). In response, we have developed the Digital Asset Management System
(BDAM) with the goal of drastically reducing the amount of time researchers
currently spend managing their data rather than extracting knowledge. This ‘iPhoto” for big data is deliver via
software as a service method and we have demonstrated its use in a variety of
big-data management problems. I will
describe the motivation and architecture of our system and illustrate with an
example from biomedical science. |
A lower bound to energy consumption of an exascale computer Ludek Kucera Building a computer system with computing power
exceeding 1 exaflops is one of the greatest
challenges of the modern computer technology. There are two very important
problems on the way to such a system - money and energy. Extrapolating the
known data about Titan-2, the second most powerful system to date (and the
largest for which the data are published), namely 17.59 petaflops
(LINPACK), the cost of 97 million $ and 8.2 MW power, we get 5.5 billion $
and 466 MW for an exaflops machine. Both values are
feasible, but there is a consensus that the future exascale
computer should be much more efficient. The approach of the present paper is to find ways to
get the lower bound to achievable power to understand how about 500 MW of
power is (or can be) distributed to different tasks performed by the
computer. The first step is to figure put how much energy is
necessary to perform $10^{18}$ floating point 64 bit multiplications,
excluding any other operations like getting the operand from a cache or a
memory. A very rough estimation can be based on the fact that the standard
$O(n^2)$ multiplication algorithms couple each bit of the first operand with
each bit of the second operand, and hence we can expect about at least 4000
bit operations per one floating point multiplications. Taking into account
that the recent semiconductor technology requires about 1 femtoJoule
per a voltage level change of one bi-state element and it can be assumed that
about one half of the bi-state elements of a multiplier changes their state,
the lower bound (under the recent technology) of one floating point 64 bit
multiplication can be set to about 2 picoJoule
($10^{-12}$ J). It follows that $10^{18}$ multiplications per second would
need more than 2 MW. This value is substantially less than 1 percent of the
extrapolated 500 MW, and therefore it would be very important to know where
the remaining power disappears and whether there are ways to decrease this
overhead. The next step is to investigate the traffic between
multipliers (and other arithmetic units) and cache and/or memory. Assuming a
feasible value of 1 femtoJoule for transferring one
bit along The present paper is a report on an on-going
research (better to say about a starting research) and our next goal is to
investigate the energy requirements for communication among different chips
(that ranges from data transfer between two neighboring
chips of one board to a transfer from one corner of the building to the
opposite corner, but always includes a passage through chip pins and their
drivers that is much more energy hungry than in-chip communication). This
problem is very more complex, because different problems have quite different
locality, i.e. the amounf of long-distance data
traffic within a computer system. The conclusion is that the arithmetic operations and
communication of arithmetic circuits with the on-chip cache accouns (or could account) for a very small part of the
computing power of a recent supercomputer and this opens a wide potential
space for energy savings by reducing the computing overhead and off-chip
communication. To achieve this goal we have to understand more energy
requirements of different overhead activities of recent supercomputers. |
Marcel Kunze Forschungsgruppe Cloud Computing, Germany The speech addresses the technical foundations and
non-technical framework of Big Data. A new era of data analytics promises
tremendous value on the basis of cloud computing technology. Can we perform
predictive analytics in real-time? Can our models scale to rapidly growing
data? The “Smart Data Innovation Lab”
at KIT addresses these challenges by supporting R&D projects to be
carried out in close cooperation between industry and science (http://www.sdil.de/en/). Some
practical examples as well as open research questions are discussed. |
EPiGRAM - Towards Exascale
Programming Models Erwin
Laure KTH Royal Institute of Exascale computing is
posing many challenges, including the question of how to efficiently program
systems exposing hundreds of millions of parallel activities. The Exascale Programming Models (EPiGRAM)
project is addressing this challenge by improving one of the most widely used
programming models, message passing, considering also the impact of PGAS
approaches. In this talk we will discuss the exascale
programming challenge, motivate our choices of message passing and PGAS,
discuss initial findings on scalability limits, and propose directions to
overcome them. |
Programming Challenges in Future Memory
Systems John Leidel Micron Technology, Inc. Given the recent hurdles associated with the pursuit
of power-performance scaling in traditional microprocessors, we have
witnessed a resurgence in research associated with the overall memory
hierarchy. The traditional symbiotic relationship of fast,
multi-level caches and larger, DRAM-based main memories has given way to
complex relationships between software-managed scratchpads, high-bandwidth
memory protocols and the use of non-volatile memories in the shared address
space. Furthermore, manufacturing technologies such as
through-silicon-via [TSV] techniques have begun to blur the lines between
processors and memory devices. The result is a significant challenge for those
constructing compiler, runtime, programming model and application technology
to address the ever-increasing heterogeneity of future system architectures. In this talk, we outline the pitfalls and potential
solutions for the programming challenges of future system architectures based
upon highly diverse memory technologies. |
Creating the HPC Infrastructure for the Human
Brain Project Juelich The Human Brain Project, one of two European flagship
projects, is a collaborative effort to reconstruct the brain, piece by piece,
in multi-scale models and their supercomputer-based simulation, integrating
and federating giant amounts of existing information and creating new
information and knowledge about the human brain. A fundamental impact on our
understanding of the human brain and its diseases as well as on novel
brain-inspired computing technologies is expected. The HPC Platform will be one of the central elements
of the project. Including major European supercomputing centres and several
universities, its mission is to build, integrate and operate the hardware,
network and software components of the supercomputing and big data
infrastructures from the cell to full-scale interactive brain simulations,
with data management, processing and visualization. In my contribution, I will discuss the requirements
of the HBP on HPC hardware and software technology. These requirements follow
the multi-scale approach of the HBP to decode the brain and recreate it
virtually. On the cellular level, hardware-software architectures for quantum
mechanical ab-initio molecular dynamics methods and for classical molecular
dynamics methods will be included in the platform. On the level of the
full-scale brain simulation, on the one hand, a development system to “build”
the brain by integration of all accessible data distributed worldwide as well
as for tests and evaluation of the brain software is foreseen, and, on the
other hand, a system that acts as the central brain simulation facility,
eventually allowing for interactive simulation and visualization of the
entire human brain. Additionally, the brain needs to be equipped with the
proper sensory environment, a body, provided by virtual robotics codes
developed on a suitable hardware system.
It is expected that the human brain project can trigger innovative
solutions for future exascale architectures
permitting hierarchical memory structures and interactive operation. |
The Fortissimo HPC-Cloud: an enabler for
engineering and manufacturing SMEs Guy
Lonsdale Scapos AG, The Fortissimo project1 is funded
Fortissimo is funded under the European Commission’s 7th Framework Programme and
is part of the I4MS (ICT Innovation for Manufacturing SMEs) group of projects
within the Factories of the Future initiative. Fortissimo’s principal
objective is to enable engineering and manufacturing SMEs to benefit from the
use of HPC and digital simulation. While the importance of advanced
simulation to the competitiveness of both large and small companies is wellestablished its broader industrial take up requires
supportive actions, since digital simulation requires significant computing
power and specialised softwaretools and services.Generally, large companies, which have a greater
pool of skills and resources, find accessto
advanced simulation easier than SMEs which can neither afford expensive High
Performance Computing (HPC) equipment nor the licensing cost for the relevant
tools. This means that SMEs are not able to take advantageof
advanced simulation, even though it can clearly make them more competitive.
The goal of Fortissimo is to overcome this impasse through the provision of
simulation services and tools running on a cloud infrastructure.A
“one-stop-shop” will greatly simplify access to advanced simulation,
particularly to SMEs. This will make hardware, expertise, applications,
visualisation and tools easily available and affordable on a pay-per-use
basis. In doing this Fortissimo will create and demonstrate a viable and
sustainable commercial ecosystem. Fortissimo will be driven by end-user requirements:
approximately 50 business-relevant application experiments will serve to
develop, test and demonstrate both the infrastructure and the Fortissimo
marketplace. 20 experiments – all HPC-cloud-based – have already been defined
in fields such as the simulation of continuous and die casting, environmental
control and urban planning, and aerodynamic design and optimisation. A second
wave of 22 new experiments is set to commence as a result of the first open
call, which broadens the engineering and manufacturing applications from an
extended range of industrial sectors. Amongst the new partners who will be
joining the project are a total of 34 SMEs, solving core business challenges
with the support of application-domain and HPC experts and resources. 1 FP7 Project 609029,project title: FORTISSIMO:
Factories of the Future Resources, Technology, Infrastructure and Services
for Simulation and Modelling. |
Accelerating the Multifrontal Method Robert
Lucas Computational Sciences Division, Information Sciences Institute,
|
Adiabatic Quantum Annealing Update Robert
Lucas Computational Sciences Division, Information Sciences Institute, Two years ago, at HPC 2012, it was reported that the
USC - |
Challenges and Roadmap for Scientific
Applications at Exascale Stefano Markidis KTH Royal Institute of Technology, One of the main challenges for scientific
applications on exascale supercomputers will be to
deal with an unprecedented amount of parallelism. Current studies of exascale machines estimate a total of billion processes
available for computation. Collective communication and synchronization of
such large number of processes will constitute bottleneck, and system noise
might amplified by non-blocking communication making the use of exascale machine ineffective. In this talk, we discuss
the challenges that have been identified by studying the communication
kernels of two applications from the EC-funded EPiGRAM
project: the Nek5000 and iPIC3D codes. The Nek5000 is a Computational Fluid
Dynamics Fortran code based on the spectral element method to solve the Navier-Stokes equations in the incompressible limit. The
iPIC3D code is a C++ Particle-in-Cell code used for space-weather. Both
application communication kernels are based on MPI. The communication kernels
of and simulations of the two applications on very large number of processes
varying the interconnection network latency and bandwidth are presented. A
roadmap to bring these applications to exascale
will be finally discussed. |
Quantum versus
Thermal annealing (or D-wave versus Janus): seeking a fair comparison Victor
Martin-Mayor Departamento de Fisica Teorica, Universidad Complutense
de Madrid, Spain & Janus Collaboration The D-Wave Two
machine presumably exploits quantum annealing effects to solve optimization
problems. One of the preferred benchmarks is the search of
ground-states for spin-glasses, one of the most computationally demanding
problems in Physics. In fact, the Janus computer has been specifically built
for spin-glasses simulations. Janus has allowed to extend the time scale of classical simulations by a factor of
1000, thus setting the standard to which D-wave should be measured. Whether D-wave’s quantum annealing achieves a real
speed-up as compared to the classical (thermal) annealing or not is a matter
of investigation.a Di (the chimera lattice), where hard-to-solve instances
are extremely rare for a small system. However, general physical arguments (temperature
chaos) tell us that, given a large enough number of q-bits, rough free-energy
landscapes should be the rule, rather than the exception. The rough landscape
implies that simulated annealing will get trapped in local minima and thus be
ine Therefore, the meaningful question is: how well quantum-annealing performs in
those instances displaying temperature-chaos? For a small number of q-bits, temperature-chaos is
rare but fortunately not nonexistent. The talk describes a program to identify
chaotic instances with only 503 q-bits by means of state of the art methods
(multi spin coding, parallel tempering simulations and the related stochastic
time-series analysis). The performance of both thermal annealing (Janus) and
quantum-annealing (D-wave) will be assessed over this set of samples. This is joint work with the Janus Collaboration and Itay Hen (Information Sciences Institute, USC). a By thermal annealing we refer to a refined form of
simulated annealing named parallel tempering (also known as exchange Monte
Carlo). |
Convergence of Extreme Big Data and HPC -
managing the memory hierarchy and data movement the key towards future exascale Satoshi
Matsuoka Global Scientific Information and Computing Center & Department of Mathematical and
Computing Sciences Tokyo Institute of Big data applications such as health are, system
biology, social networks, business intelligence, and electric power grids,
etc., require fast and scalable data analytics capability, posing significant
opportunities for HPC, as evidenced by recent attentions to the Graph500 list
and the Green Graph500 list. In order to cope with massive capacity
requirements of such big data applications, emerging NVM(Non-Volatile Memory)
devices, such as Flash, realize low cost high energy-efficiency compared to
conventional DRAM devices, at the expense of low throughput and latency,
requiring deepening of the memory hierarchy. As such effective abstractions
and efficient implementation techniques for algorithms and data structures
for big data to overcome the deepening memory hierarchy is becoming
essential. Our recent project JST-CREST EBD (Extreme Big Data)
aims to come up with a big data / HPC convergence architecture that provide
such algorithms and abstractions. In particular, our objective is to control
the deep memory hierarchy and data movement effectively to achieve tremendous
boost in performance per resource cost (power, dollars, etc.), which is
becoming the dominating metric in future exascale
supercomputers as well as big data. Although we are still in early stages of our
research, we have already achieved several results such as novel graph data
offloading technique using NVMs for the hybrid BFS (Breadth-First Search)
algorithm widely used in the Graph500 benchmark, achieving 4.35MTEPS/Watt on
a Scale 30 problem, ranked 4th in the big data category in the Green Graph500
(November 2013). |
Prospects for the Kenichi
Miura, Ph.D. National With the recent trends in HPC architecture toward
higher and higher degree of parallelism, some of the traditional numerical
algorithms need to be reconsidered due to their poor scalability. This is due
to (1) declining memory capacity per CPU core, (2) limits in the
inter-processor communication bandwidth (bytes/flop ratio), (3) fault-tolerance
issue, etc. The Monte Carlo Methods (MCMs) are numerical methods
based on statistical sampling, and were systematically studied in the early
days of computing in various application areas. The MCMs have properties which match very well with the
above-mentioned trends in HPC architecture. They are: (1) an inherently high
degree of parallelism, (2) small memory requirements per processor, (3) natural resilience due to their statistical
approach. In the MCMs, good pseudo-random number generator is
essential. Their requirements are:(1)long period,(2) good statistical
characteristics (both within a processor and across the processors),(3) fast
generation of random sequence, and (4) low overhead in initializing the
generators on each processor. Last requirement is particularly important as
the number of processor is very large. I have been proposing one such
generator called MRG8 with such properties. In my talk, I will discuss the prospects and issues
of the MCMs as well as propose widening application areas in the million-core
era and beyond. |
Scaling lessons from the software challenges
in Anton, a special-purpose machine for molecular dynamics simulation Mark Moraes Head Engineering Group, D.E. Shaw Research, Anton is a massively parallel special-purpose
machine that accelerates molecular dynamics simulations by orders of
magnitude compared with the previous state of the art. The hardware
architecture, many of the algorithms, and all of the software were developed
specifically for Anton. We exploit the highly specialized nature of the
hardware to gain both weak and strong scaling of molecular dynamics
simulations. However, the tradeoffs involved
in specialized hardware create many interesting challenges for software. This
talk describes how we tackled these challenges and the techniques used to
achieve efficient scaling of simulation performance on two successive
generations of a special-purpose machine. |
Valerio Pascucci Director, Center
for Extreme Data Management Analysis and Visualization Professor, School of Computing, University of
Laboratory Fellow, CTO, ViSUS Inc.
(visus.net) We live in the era of Big Data, which is characterized by an
unprecedented increase in information generated from many sources including (i) massive simulations for science and engineering, (ii)
sensing devices for experiments and diagnostics, and (iii) records of
people's activities left actively or passively primarily on the web. This is
a gift to many disciplines in science and engineering since it will
undoubtedly lead to a wide range of new amazing discoveries. This is also changing
the nature of scientific investigation, combining theory, experiments, and
simulations with the so-called “fourth paradigm” of data-driven discovery.
Interdisciplinary work, traditionally confined to a few heroic efforts, will
become a central requirement in most research activities since progress can
only be achieved with a combination of intense computing infrastructures and
domain expertise. For example, computational efforts can only validated in
the proper application context, such as in climate modeling,
biology, economics, and social sciences, to name just a few. In this talk I will discuss some of the experiences in Big Data
discovery that have driven the activities at the Center
for Extreme Data Management Analysis and Visualization. The technical work,
for example, is systematically reshaped to involve integrated use of multiple
computer science techniques such as data management, analytics, high
performance computing, and visualization. Research agendas are motivated by
grand challenges, for instance, the development of new, sustainable energy
sources, or predicting and understanding climate change. Furthermore, such
efforts rely on multi-disciplinary partnerships with teams that extend across
academia, government laboratories and industry. Overall, the great
opportunities of Big Data research come with great challenges in terms of how
we reshape scientific investigation, collaborations across disciplines, and
how we educate the future generations of scientists and engineers. BIOGRAPHY Valerio Pascucci is the founding Director of
the Center for Extreme Data Management Analysis and
Visualization (CEDMAV) of the |
Scalability in the Cloud: HPC Convergence
with Big Data in Design, Engineering, Manufacturing David Pellerin AWS High Performance Computing HPC in the cloud is now a common approach for research
computing, and is becoming mainstream as well for commercial HPC across
industries. Cloud enables the convergence of big data analytics, for example
Hadoop, with scalable and low-cost computing, allowing commercial firms to
perform analytics that would not otherwise be practical. Cloud allows
customized, application-specific HPC clusters to be created, used, and
decommissioned in a matter of minutes or hours, enabling entirely new kinds
of parallel applications to be run against widely diverse datasets. Cloud
also enables global collaboration for distributed teams, using remote
visualization and remote login methods. Scalability in the cloud provides HPC
users with large amounts of computing power, but also requires new thinking
about application fault-tolerance, cluster right-sizing, and data storage
architectures. This session will provide an overview of current cloud
capabilities and best practices for scalable HPC and for global
collaboration, using specific use-cases in design, engineering, and manufacturing. |
Overcoming the Cloud heterogeneity: from
uniform interfaces and abstract models to multi-cloud platforms Dana Petcu Research Institute e-Austria The Cloud heterogeneity is manifested today in the
set of interfaces of the services from different Public Clouds, in the set of
services from the same provider, in the software or hardware stacks, in the
terms of performance or user quality of experience. This heterogeneity is favoring the Cloud service providers allowing them to be
competitive in a very dynamic market especially by exposing unique solutions.
However such heterogeneity is hindering the interoperability between these
services and the portability of the applications consuming the services, as
well as the seamless migration of legacy applications towards Cloud
environments. Various solutions to overcome the Cloud
heterogeneity have been investigated in the last half decade, starting from
the definition of uniform interfaces (defining the communalities, but loosing
the specificities) and arriving to domain specific languages (allowing to
conceive applications at a Cloud-agnostic level, but introducing a high
overhead). We discuss the existing approaches and their
completeness from the perspective of building support platforms for
Multi-Clouds, identifying the gaps and potential solutions. Concrete examples
are taken from recent experiments in developing Multi-Cloud platform
prototypes: mOSAIC [1] in what concerns uniform
interfaces, MODAClouds [2] in what concerns domain
specific languages, SPECS [3] in what concerns user’s quality of experience,
HOST [4] in what concerns the usage of Cloud HPC services. References: [1] D. Petcu, B. Di Martino, S. Venticinque, M. Rak, T. Máhr, G. Esnal Lopez, F. Brito, R. Cossu, M. Stopar, S. Sperka, V. Stankovski, Experiences in Building a mOSAIC
of Clouds, Journal of Cloud Computing: Advances, Systems and Applications
2013, 2:12, on-line May 2013, doi:
10.1186/2192-113X-2-12 [2] D. Ardagna, E. Di Nitto, G. Casale, D. Petcu, P. Mohagheghi, S. Mosser, P.Matthews, A. Gericke, C. Ballagny, F. D’Andria, C.S. Nechifor, C. Sheridan, MODACLOUDS: A Model-Driven
Approach for the Design and Execution of Applications on Multiple Clouds, Procs. MISE 2012, 50-56, doi:
10.1109/MISE.2012.6226014 [3] M. Rak, N. Suri,
J. Luna, D. Petcu, V. Casola,
U. Villano, Umberto, Security as a Service Using an
SLA-Based Approach via SPECS, 2013 IEEE 5th International Conference on Cloud
Computing Technology and Science (CloudCom), vol.2,
no., 1-6, 2-5 Dec. 2013, doi:
10.1109/CloudCom.2013.165 [4] M.E. Frincu, D. Petcu,
Resource Management for HPC on the Cloud, In: Emmanuel Jeannot,
Julius Zilinskas (eds.), High-Performance Computing
on Complex Environments, ISBN: 978-1-118-71205-4, June 2014, 303-323 |
Harp: Collective Communication on Hadoop Judy Qiu Many scientific applications are data-intensive. It is estimated that
organizations with high end computing infrastructures and data centers are doubling the amount of data that they are
archiving every year. Harp extends MapReduce, enabling HPC-Cloud
Interoperability. We show how to apply Harp to support large-scale iterative
computations that are common in many important data mining and machine
learning applications. Further one needs additional communication patterns
than those made familiar in MapReduce. This leads
us to the Map-Collective programming model that captures the full range of
traditional MapReduce and MPI features, which is
built on a new communication abstraction, Harp, that is integrated with Hadoop. It provides optimized communication
operations on different data abstractions such as arrays, key-values and
graphs. With improved expressiveness and performance on collective
communication, Hadoop/Harp can do in-memory communication between Map tasks
without writing intermediate data to HDFS, enabling simultaneous support of
applications from HPC to Cloud. Our work includes a detailed performance
evaluation on IaaS or HPC environments such as FutureGrid and the Big Red II supercomputer, and provides
useful insights to both frameworks and applications. Short Bio Dr. Judy Qiu is an
assistant professor of Computer Science in the Link to website: http://www.cs.indiana.edu/~xqiu/ |
Beowulf meets Exascale
System Software: A horizontally integrated framework Mark Seager CTO for HPC Systems, INTEL, The
challenges of system software for Exascale systems
requires a codesign between hardware, system
software and applications in order to address the massive parallelism,
enhance RAS, and parallel IO. However,
the current approach to Linux cluster software with its roots in Beowulf
clusters, is simply an coordinated collection of elements/services (e.g.,
resource management, RAS, parallel file system). We discuss a new scalable
system software approach that enables individual system software
elements/services to leverage horizontally integrated set of system software
services. This allows Intel Architecture
ecosystem contributors to enhance components to add value and leverage this
infrastructure to more efficiently produce a robust system software stack for
Exascale class systems. We also discuss how this system software
stack can be scaled down to the broader HPC space. |
Coordination programming for self-tuning: the
challenge of a heterogeneous open environment Alex Shafarenko Compiler Technology and Computer Architecture
Group, Kahn process networks (KPNs) are a convenient basis
for coordination programming as they effectively isolate a component, which
encapsulates a self-contained algorithm, from the network that connects,
controls and synchronises the components’ activities. The strength of KPNs is
in their clear parallel semantics, defined by Kahn’s famous paper of 1974.
Their weakness is that they by themselves do not suggest relative priorities
of the vertex processes for maximising performance on a finite system and that
if they are not regulated properly, they may demand unlimited resources and
deadlock when they don’t get them. This talk will present the project AstraKahn which introduces a coordination model for KPN
self-regulation (also known as self-tuning), based on the concept of positive
(supply) and negative (demand) pressures, proliferation of vertices under
large supply and demand and fragmentation of messages in order to improve the
pressure situation. This is achieved by a map/reduce classification of KPN vertices,
the use of synchro-vertices in the form of
pressure-controlled FSM and the idea of vertex morphism enabling message
fragmentation. The talk will touch upon the AstraKahn
coordination language that consists of three layers: a Topology and Progress Layer,
a Constraint Aggregation Layer and a Data and Instrumentation Layer, which we
maintain in combination will achieve a large degree of automatic self-tuning
of coordination programs. The practical significance of this work is in its
attempt to propose an HPC approach for a non-autonomous computing platform,
such as public clouds, where a priori optimisations may be inhibited and
where run-time adaptation may be the only way to tackle platform
unpredictability. |
Exascale Programming Challenges: Adjusting to the new
normal for computer architecture John Shalf For the past twenty-five years, a single model of parallel
programming (largely bulk-synchronous MPI), has for the most part been
sufficient to permit translation of this into reasonable parallel programs
for more complex applications. In 2004, however, a confluence of events
changed forever the architectural landscape that underpinned our current
assumptions about what to optimize for when we design new algorithms and
applications. We have been taught to
prioritize and conserve things that were valuable 20 years ago, but the new
technology trends have inverted the value of our former optimization targets.
The time has come to examine the end result of our extrapolated design trends
and use them as a guide to re-prioritize what resources to conserve in order
to derive performance for future applications. This talk will describe the
challenges of programming future computing systems. It will then provide some
highlights from the search for durable programming abstractions more closely
track track emerging computer technology trends so
that when we convert our codes over, they will last through the next decade. |
Extreme-scale Architecture in the Neo-Digital
Age Thomas
Sterling & As the end of |
Programming Script-based Data Analytics
Workflows on Clouds Domenico Talia Department of Computer Engineering,
Electronics, and Systems Data analysis applications often involve large
datasets and are complex software systems in which multiple data processing tools
are executed in a coordinated way. Data analysis workflows are effective in
expressing task coordination and they can be designed through visual and
script-based frameworks. End users prefer the visual approach whereas expert
developers use workflow languages to program complex applications more
effectively. For providing Cloud users with an effective
script-based data analysis workflow formalism, we designed the JS4Cloud
language based on the well-known JavaScript language, so that users do not
have to learn a new programming language from scratch. JS4Cloud implements a
data-driven task parallelism that spawns ready-to-run tasks to Cloud
resources. It exploits implicit parallelism that frees users from duties like
work partitioning, synchronization and communication. In this talk we present how JS4Cloud has been
integrated within the Data Mining Cloud Framework (DMCF), a system supporting
the scalable execution of data analysis workflows on Cloud platforms. We also
describe how data analysis workflows are modeled as
JS4Cloud scripts and executed in parallel on DMCF to enable scalable data
processing on Clouds. |
Extreme Scale Computing Advances &
Challenges in PIC Simulations William
M. Tang Fusion Simulation Program, Princeton Plasma Physics Lab. & The primary challenge in extreme scale computing is
to translate the combination of the rapid advances in super-computing power
together with the emergence of effective new algorithms and computational
methodologies to help enable corresponding increases in the realism and
reduction in “time-to-solution” of advanced scientific and engineering codes
used to model complex physical systems. If properly validated against experimental
measurements/observational data and verified with mathematical tests and
computational benchmarks, these codes can greatly improve high-fidelity
predictive capability for the behaviour of complex systems -- including
fusion-energy-relevant high temperature plasmas. The nuclear fusion energy
project “NuFuSE” within the International G8 Exascale Program has made excellent progress in developing
advanced codes for which computer run-time and problem size scale very well
with the number of processors on massively parallel many-core supercomputers.
A good example is the effective usage of the full power of modern leadership
class computational platforms at the petascale and
beyond to produce nonlinear particle-in-cell (PIC) gyrokinetic
simulations which have accelerated progress in understanding the nature of
plasma turbulence in magnetically-confined high temperature plasmas.
Illustrative results provide great encouragement for being able to include
increasingly realistic dynamics in extreme-scale computing campaigns with the
goal of enabling predictive simulations characterized by unprecedented
physics realism. This presentation will review progress and discuss
open issues associated with new challenges encountered in extreme scale
computing for the fusion energy science application domain. Some illustrative
examples will be presented of the algorithmic advances for dealing with low
memory per core extreme scale computing challenges on prominent
supercomputers worldwide. These include advanced homogeneous systems -- such
as the IBM-Blue-Gene-Q systems (“Mira” at the Argonne National Laboratory
& “Sequoia” at the Lawrence Livermore National Laboratory in the US), the
Fujitsu K Computer at the RIKEN AICS, Japan – as well as leading heterogenous systems – such as the GPU-CPU hybrid system
“Titan” at the Oak Ridge National Laboratory, the world-leading TH-2
CPU/Xeon-Phi system in Guangzhou, China, and the new GPU-accelerated XC30
(“Piz Daint”) system at the CSCS in Switzerland. |
From
Sensors to Supercomputers, Big Data Begins With Little Data Eric Van Hensbergen ARM Research, USA Semiconductor technology has made it possible to
build a 32 bit microprocessor subsystem with a sensor and a network
connection on a piece of silicon the size of a spec of dust, and do it almost
for free. As a result, over the next 5-10 years pretty much anything that can
benefit from being connected to the internet soon will be. This talk will explore the technology challenges
across the spectrum from sensors to supercomputers. It will discuss the opportunity for collaboration on
distributed system architectures which create a platform for deploying
intelligence capable of coping with the lifecycle of information as it goes
from little data to big data to valuable insights. |
Thermomechanical Behaviour and Materials Damage: Multimillion-Billion Atom Reactive Molecular
Dynamics Simulations Priya Vashishta Collaboratory for Advanced Computing and Simulations Departments of Chemical Engineering &
Materials Science, Physics & Astronomy, and Computer Science, Advanced materials and devices with nanometer grain/feature sizes are being developed to
achieve higher strength and toughness in materials and greater speeds in
electronic devices. Below 100nm, however, continuum description of materials
and devices must be supplemented by atomistic descriptions. Reactive molecular dynamics simulations are
used to investigate critical issues in the area materials damage using
structural and dynamical correlations, and reactive processes in metals and
glasses under extreme conditions. In this talk I will discuss three simulations. Embrittlement of Nickel by Sulfur Segregation-Induced Amorphization: Impurities segregated to grain boundaries
of a material essentially alter its fracture behavior.
A prime example is sulfur segregation-induced embrittlement of nickel, where an observed relation
between sulfur-induced amorphization
of grain boundaries and embrittlement remains
unexplained. Here, 48 million-atom reactive-force-field molecular dynamics
simulations (MD), run for 45 million core hours using 64,000 cores at IBM BlueGene/P, provide the missing link. Nanobubble Collapse in Water – Billion-atom Reactive Molecular Dynamics
Simulations: Cavitation
bubbles readily occur in fluids subjected to rapid changes in pressure. We
use billion-atom reactive molecular dynamics simulations on full
163,840-processor BlueGene/P supercomputer run for
67 million core hours to investigate chemical and mechanical damages caused
by shock-induced collapse of nanobubbles in water
near silica surface. Collapse of an empty nanobubble
generates high-speed nanojet, resulting in the
formation of a pit its volume is found to be directly proportional to the
volume of the nanobubble. The gas-filled bubbles
undergo partial collapse and consequently the damage on the silica surface is
mitigated. Hydrogen-on-demand Using Metallic Alloy Particles in Water –
16,611-atom Quantum Molecular Dynamics Simulations: Hydrogen production from water using Al particles
could provide a renewable energy cycle. However, its practical application is
hampered by the low reaction rate and poor yield. Our large quantum molecular
dynamics simulations involving up to 16,611 atoms on 786,432-processor BlueGene/Q show that orders-of-magnitude faster reactions
can be achieved by alloying Al particles with Li. A key nanostructural
design is identified where water-dissociation and hydrogen-production require
very small activation energies. Furthermore, dissolution of Li atoms into
water produces a corrosive basic solution that inhibits the formation of a
reaction-stopping oxide layer on the particle surface, thereby increasing the
hydrogen yield. Acknowledgement: This research
was supported by the DOE-BES-Theoretical Condensed Matter Physics Grant Number DE-FG02-04ER46130. The
computing resources for this research were provided by a DOE—Innovative and
Novel Computational Impact on Theory and Experiment (INCITE) award. We thank
Paul Messina, Nichols Romero, William Scullin and
the support team of Argonne Leadership Computing Facility for scheduling 67
million core hours in units of 60-hour blocks on full BlueGene/P
for the simulations and Joseph Insley, |
Clouds
for meteorology, two cases study Jose Luis
Vazquez-Poletti Dpt. de Arquitectura de Computadores y
Automática Universidad Complutense
de Madrid, Spain Meteorology is among the most promising areas that benefit
from cloud computing, due to its intersection with society’s critical
aspects. Executing meteorological applications involves HPC and HTC
challenges, but also economic ones. The present talk will introduce two cases with
different backgrounds and motivations, but always sharing a similar cloud
methodology: the first one is about weather forecasting in the context of
planet Mars exploration; and the second one deals with data processing from
weather sensor networks, in the context of an agriculture improving plan at |
Medical practice: diagnostics, treatment and
surgery in supercomputer centers Vladimir
V. Voevodin We are used to the extraordinary capabilities of
supercomputers and expect them to be applied in practice accordingly. These
are reasonable expectations; however, reality isn’t so optimistic. It is
widely known how inefficient supercomputers can be when applied to actual
problems: only a tiny share of their peak performance is usually achieved.
However, few people are aware of efficiency levels demonstrated by a
supercomputer center as a whole. While the
efficiency factor of a supercomputer executing a particular application is
comparable to that of a steam locomotive (about 5%), the total efficiency of
a supercomputer center constitutes only a small
fraction of it. The losses that occur at each stage may be insignificant, but
they accumulate over processing the entire user jobs flow and multiply considerably.
Every detail of this process is important and all elements of supercomputer centers should be taken into consideration, starting from
jobs queue policy and jobs flow structure and ending with system software
configuration and efficient operation of engineering infrastructure. Is it possible to significantly increase the
efficiency of supercomputer centers without
investing huge amounts of money into their upgrades? Yes, it is. But it
requires permanent diagnostics, very similar to medical ones, and if
necessary – intensive treatment and emergency surgery on supercomputer
systems. |
Robert
Wisniewski Chief Software Architect Exascale
Computing INTEL Corporation Many discussions have occurred on system software
needed to support exascale. In recent talks I described a simultaneous
revolutionary and evolutionary approach, but observed that exascale is just a step along the PEZ (Peta-Exa-Zetta) path.
HPC researchers study both capacity computing (cloud-like workloads)
and capability computing (classical HPC and big science workloads). Some researchers see a convergence of these
needs and technologies. Part of that
answer will lie in the needs of the future of each of these paths' this talk
will explore the capability aspect. A zettascale or yottascale
capability machine would open new frontiers in contributions computing could
make to fundamental science including breakthroughs in biological sciences
that are just beginning to benefit from capability-class HPC machines. It is important to recognize that although
there need not, and applications do not want, a discontinuity, there needs to
be a significant shift in software models.
In this talk, I will present a peak into what system software might
look like for a post-exascale machine. While I will base this view of system
software on potential hardware projections from one side and application's
needs on the other, I will be liberal in my assumptions about where each of
these could get to. The talk will
therefore be a window into what system software might look like to support
capability machines in the post-exascale era. |