HPC 2016
High Performance Computing
FROM clouds and BIG DATA to EXASCALE AND BEYOND
An
International Advanced Workshop
June
27 – July 1, 2016, Cetraro, Italy
Final Programme
L.
GRANDINETTI (Chair) University of Calabria F.
BAETKE Hewlett Packard P.
BECKMAN Argonne National
Lab. C.
CATLETT Argonne National Lab. and University of
Chicago G.
DE PIETRO National Research Council of Italy J.
DONGARRA University of Tennessee S.
S. DOSANJH Lawrence Berkeley National Lab. I.
FOSTER Argonne National Lab. and University of
Chicago G.
FOX Indiana University W.
GENTZSCH The UberCloud V.
GETOV University of Westminster G.
JOUBERT Technical University Clausthal E.
LAURE Royal Institute of Technology Stockholm C. A. LEE The
Aerospace Corporation T.
LIPPERT Juelich Supercomputing Centre I.
LLORENTE Universidad Complutense de Madrid B.
LUCAS University of Southern California S.
MATSUOKA Tokyo Institute of Technology P.
MESSINA Argonne National Laboratory V.
PASCUCCI University of Utah and Pacific Northwest National Lab N.
PETKOV University of Groningen J.
QIU School of Informatics and Computing and
Indiana University M. SEAGER INTEL S.
SEKIGUCHI National Institute of Industrial Science and Technology T.
STERLING Indiana University R.
STEVENS Argonne
National Laboratory D. TALIA University
of Calabria W. TANG Princeton
University |
ITALY U.S.A. U.S.A. U.S.A. ITALY U.S.A. U.S.A. U.S.A. U.S.A. GERMANY U.K. GERMANY SWEDEN U.S.A. GERMANY SPAIN U.S.A. JAPAN U.S.A. U.S.A. NETHERLANDS U.S.A. U.S.A. JAPAN U.S.A. U.S.A. ITALY U.S.A. |
|||||
L. GRANDINETTI Center
of Excellence for High Performance Computing, UNICAL, Italy |
T. LIPPERT Institute
for Advanced Simulation, Juelich Supercomputing
Centre, Germany |
|||||
Organizing
Committee |
||||||
L. GRANDINETTI (Co-Chair) ITALY |
T. LIPPERT (Co-Chair) GERMANY |
|||||
M. ALBAALI (OMAN) |
C. CATLETT (U.S.A.) |
J. DONGARRA (U.S.A.) |
W. GENTZSCH (GERMANY) |
O. PISACANE (ITALY) |
M.
SHEIKHALISHAHI (ITALY) |
|
AMAZON WEB SERVICES |
|
ARM |
|
CRAY |
|
CSCS – SWISS NATIONAL
SUPERCOMPUTING CENTER |
|
HEWLETT PACKARD ENTERPRISE |
|
INTEL |
|
JUELICH SUPERCOMPUTING CENTER, Germany |
|
MELLANOX TECHNOLOGIES |
|
MICRON TECHNOLOGY |
|
NEC |
|
SCHNEIDER ELECTRIC |
|
DIPARTIMENTO DI INGEGNERIA
DELL’INNOVAZIONE – UNIVERSITÀ DEL SALENTO |
|
UNIVERSITÀ DELLA CALABRIA |
UNIVERSITÀ DELLA CALABRIA |
NATIONAL RESEARCH COUNCIL OF
ITALY - ICAR - INSTITUTE FOR HIGH PERFORMANCE COMPUTING AND NETWORKS |
|
Media Partners
Free Amazon web Service credits for all
HPC 2016 delegates Amazon is very pleased to be able to
provide $200 in service credits to all HPC 2016 delegates. Amazon Web
Services provides a collection of scalable high performance and
data-intensive computing services, storage, connectivity, and integration
tools. AWS allows you to increase the speed of research and to reduce costs
by providing Cluster Compute or Cluster GPU servers on-demand. You have
access to a full-bisection, high bandwidth network for tightly-coupled,
IO-intensive workloads, which enables you to scale out across thousands of
cores for throughput-oriented applications. |
UberCloud is the online community and marketplace platform for engineers and scientists to discover, try, and buy computing time, on demand, in the Cloud. Our novel software containers facilitate software packaging and portability, simplify access and use of cloud resources, and ease software maintenance and support for end-users and their service providers. Please register
for the UberCloud Voice
Newsletter, or for performing an HPC
Experiment in the Cloud. |
Jim Ahrens Los Alamos National Laboratory Los Alamos, NM USA James A. Ang Exascale Computing Program Center for Computing Research Sandia National Laboratories Albuquerque, NM USA Frank Baetke HPC in Academia and Scientific Research Hewlett Packard Palo Alto, CA USA Peter Beckman Exascale Technology and Computing Institute Argonne National Laboratory Argonne, IL USA Isabel Beichl National Institute of Standards and Technology Gaithersberg, MD USA Euro Beinat University of Salzburg Salzburg AUSTRIA Budhendra Bhaduri Urban/GIS Center Oak Ridge National Laboratory Oak Ridge, TN USA Gil Bloch Mellanox Technologies Sunnyvale, CA USA Brendan Bouffler Scientific Computing Amazon Web Services London UNITED KINGDOM Ronald Brightwell Sandia National Laboratories Albuquerque, NM USA Charlie Catlett Math & Computer Science Div. Argonne National Laboratory Argonne, IL and Computation Institute of The University of Chicago and Argonne National Laboratory Chicago, IL USA Eugenio Cesario National Research Council of Italy ICAR – CNR Rende – Cosenza ITALY David Chadwick University of Kent Canterbury UNITED KINGDOM Marcello Coppola STMicroelectronics Grenoble FRANCE Beniamino Di Martino Department of Industrial and Information Engineering University of Naples 2 Naples ITALY Jack Dongarra Innovative Computing Laboratory Computer Science Dept. University of
Tennessee Knoxville, TN USA Sudip S. Dosanjh National Energy
Research Scientific Computing Center Lawrence Berkeley National
Laboratory Berkeley, CA USA Ian Foster Math & Computer Science Div. Argonne National Laboratory Argonne, IL and Dept of Computer
Science The University of Chicago Chicago, IL USA Geoffrey Fox Community Grid
Computing Laboratory Indiana University Bloomington, IN USA Wolfgang Gentzsch The UberCloud GERMANY Vladimir Getov Department of Engineering Faculty of Science and Technology University of Westminster London UNITED KINGDOM Brett Goldstein University of Chicago Chicago, IL USA Sergei Gorlatch Universitaet Muenster Institut für Informatik Muenster GERMANY Torsten Hoefler Scalable Parallel Computing Lab Computer Science
Department ETH Zurich Zurich SWITZERLAND Takeo Hosomi System Platform Research Laboratories NEC Kanagawa JAPAN Carl Kesselman Information Sciences Institute University of Southern
California Marina del Rey, Los Angeles, CA USA David Keyes King Abdullah University of Science and Technology Thuwal SAUDI ARABIA Julia Lane Wagner School Center for Urban Science and Progress New York University New York, NY USA Craig Lee Computer Systems Research Dept. The Aerospace
Corporation El Segundo, CA USA Thomas Lippert Juelich Supercomputing Centre Forschungszentrum Juelich Juelich GERMANY Yutong Lu School of Computer Science National University of Defense Technology Changsha, Hunan Province CHINA Stefano Markidis KTH Royal Institute of Technology Stockholm SWEDEN Patrick Martin School of Computing Queen’s University Kingston, Ontario CANADA Satoshi Matsuoka Global Scientific Information and Computing Center & Department of Mathematical and Computing Sciences Tokyo Institute of Technology Tokyo JAPAN Paul Messina DOE Exascale
Computing Project Argonne National Laboratory Argonne, IL USA Jarek Nabrzyski Department of Computer Science and Engineering University of Notre Dame and Center for Research Computing and Great Lakes Consortium for Petascale
Computation Notre Dame, Indiana USA Stefano Nativi National Resarch Council of Italy Florence ITALY Manish Parashar Dept. of Computer Science Rutgers University Piscataway, NJ USA Valerio Pascucci University of Utah Center for Extreme Data Management, Analysis and Visualization, Scientific Computing and Imaging Institute, School of Computing and Pacific Northwest National Laboratory Salt Lake City, UT USA Stephen Pawlowski Advanced Computing Solutions Micron Technology Portland, OR USA Kristen Pudenz Quantum Applications Engineering Lockheed Martin Fort Worth, TX USA Judy Qiu School of Informatics and Computing and Pervasive Technology Institute Indiana University USA Ulrich Ruede Lehrstuhl fuer
Simulation Universitaet Erlangen-Nuernberg Erlangen GERMANY Sébastien Rumley Lightwave Research Laboratory Department of Electrical Engineering School of Engineering and Applied Science Columbia University New York, NY USA Thomas Schulthess CSCS Swiss National Supercomputing Centre Lugano and ETH Zurich Switzerland John Shalf Lawrence Berkeley National Laboratory Computing Research Division and National Energy Research Supercomputing Center Berkeley, CA USA Sadasivan Shankar Harvard University School of Engineering and Applied Sciences Cambridge, MA USA Karl Solchenbach Intel Exascale Labs Europe GERMANY Thomas Sterling School of Informatics and Computing and CREST Center for Research in Extreme Scale
Technologies Indiana University Bloomington, IN USA Rick Stevens Argonne National Laboratory and Department of Computer Science, The University of Chicago Argonne and Chicago USA Francis Sullivan IDA/Center for Computing Sciences Bowie, MD USA Domenico Talia Department of Computer Engineering, Electronics, and Systems University of Calabria ITALY William Tang Princeton University Dept. of Astrophysical Sciences, Plasma Physics Section Fusion Simulation Program Princeton Plasma Physics Lab. and Princeton Institute for Computational Science and Engineering Princeton USA Adrian Tate Cray EMEA Research Lab United Kingdom Steve Tuecke Computation Institute The University of Chicago Chicago, IL USA Eric Van Hensbergen ARM Research Austin, TX USA Vladimir Voevodin Moscow State University Research Computing Center Moscow RUSSIA |
Monday,
June 27th
Session |
Time |
Speaker/Activity |
|
9:00 – 9:15 |
Welcome
Address |
|
State of the Art
and Future Scenarios |
|
|
9:15 – 9:45 |
J.
Dongarra |
|
9:45 – 10:15 |
P.
BECKMAN What can we Change? |
|
10:15 – 10:45 |
I.
FOSTER |
|
10:45 – 11:15 |
S.
MATSUOKA From FLOPS to BYTES: Distruptive End of Moore’s Law beyond Exascale |
|
11:15 – 11:45 |
COFFEE BREAK |
|
11:45 – 12:15 |
R.
STEVENS DOE-NCI Joint Development of
Advanced Computing Solutions for Cancer |
|
12:15 – 12:45 |
C.
Kesselman |
|
12:45 – 13:00 |
CONCLUDING REMARKS |
|
Emerging Computer
Systems and Solutions |
|
|
16:30 – 17:00 |
F.
BAETKE Trends in System Architectures:
Towards “The Machine” and Beyond |
|
17:00 – 17:25 |
K.
SOLCHENBACH |
|
17:25 – 17:50 |
A.
tATE Towards Support of Highly-Varied
Workloads on Supercomputers |
|
17:50 – 18:15 |
E.
VAN HENSBERGEN |
|
18:15 – 18:45 |
COFFEE BREAK |
|
18:45 – 19:10 |
T.B.A. |
|
19:10 – 19:35 |
t.
HOSOMI |
|
19:35 – 20:00 |
B.
BOUFFLER |
|
20:00 – 20:10 |
CONCLUDING REMARKS |
Tuesday, June 28th
Session |
Time |
Speaker/Activity |
|
Advances in HPC
Technology and Systems |
|
|
9:00 – 9:25 |
S.
GORLATCH Using Modern C++ with
Multi-Staging for Unified Programming on GPU Systems |
|
9:25 – 9:50 |
M.
COPPOLA Generic Packet Processing Unit
a novel way to implement low cost and efficient FPGA computing |
|
9:50 – 10:15 |
V.
GETOV Application-Specific Energy Modeling of Multi-Core Processors |
|
10:15 – 10:40 |
J.
NABRZYSKI |
|
10:40 – 11:05 |
J.
SHALF |
|
11:05 – 11:15 |
CONCLUDING REMARKS |
|
11:15 – 11:45 |
COFFEE BREAK |
|
Software and
Architecture for Extreme Scale Computing I |
|
|
11:45 – 12:10 |
S.
DOSANJH |
|
12:10 – 12:35 |
G.
FOX Application
and Software Classifications that motivate Big Data and Big Simulation
Convergence |
|
12:35 – 13:00 |
R.
BRIGHTWELL |
|
Software and
Architecture for Extreme Scale Computing II |
|
|
16:30 – 17:00 |
T.
LIPPERT t.b.a. |
|
17:00 – 17:25 |
J.
Ahrens |
|
17:25 – 17:50 |
S.
MARKIDIS Towards a Continuous
Description of Compute and Idle Phases in Scientific Parallel Applications |
|
17:50 – 18:15 |
v.
voevodin |
|
18:15 – 18:45 |
COFFEE BREAK |
|
18:45 – 19:15 |
T.
HOEFLER Progress in automatic GPU
compilation and why you want to run MPI on your GPU |
|
19:15 – 19:45 |
G.
Bloch |
|
19:45 – 20:00 |
CONCLUDING REMARKS |
Wednesday, June 29th
Session |
Time |
Speaker/Activity |
|
Exascale Computing and
Beyond |
|
|
9:00 – 9:25 |
R.
STEVENS The potential to augment HPC
systems with Neuromorphic Computing Accelerators |
|
9:25 – 9:50 |
T.
STERLING |
|
9:50 – 10:15 |
S.
RUMLEY |
|
10:15 – 10:40 |
S.
PAWLOWSKI |
|
10:40 – 11:05 |
P.
MESSINA A Path to Exascale |
11:05 – 11:35 |
COFFEE
BREAK |
|
|
11:35 – 12:00 |
T.
SCHULTHESS t.b.a. |
|
12:00 – 12:25 |
J.
ANG Exascale
System and Node Architectures: The Summit and Beyond |
|
12:25 – 12:50 |
J.
SHALF |
|
12:50 – 13:00 |
CONCLUDING REMARKS |
|
Cloud Computing
Technology and Systems |
|
|
15:45 – 16:10 |
C.
LEE |
|
16:10 – 16:35 |
S.
TUECKE |
|
16:35 - 17:00 |
S.
NATIVI |
|
17:00 – 17:25 |
D.
CHADWICK Homogeneous authorization
policies in heterogeneous IAAS clouds |
|
17:25 – 17:50 |
B.
DI MARTINO |
|
17:50 – 18:15 |
J.
QIU Convergence of HPC and Clouds for
Large-Scale Data Enabled Science |
|
18:15 – 18:45 |
COFFEE
BREAK |
|
18:45 – 20:00 |
PANEL DISCUSSION: “What is Capable Exascale Computing?” Chairman: P. Messina, Argonne National Laboratory, DOE, U.S.A. |
Thursday, June 30th
Friday, July 1st
Session |
Time |
Speaker/Activity |
|
Challenging
applications of HPC and Clouds |
|
|
9:00 – 9:30 |
W.
GENTZSCH Toward Democratization of HPC
with Novel Software Containers |
|
9:30 – 10:0 |
S.
SHANKAR Co-design 3.0 – Configurable
Extreme Computing leveraging Moore’s Law for Real Applications |
|
10:00 – 10:30 |
K.
PUDENZ |
|
10:30 – 11:00 |
D.
KEYES |
|
11:00 – 11:30 |
COFFEE BREAK |
|
11:30 – 12:00 |
W.
TANG Kinetic Turbulence Simulations on
Top Supercomputers Worldwide |
|
12:00 – 12:30 |
U.
RUEDE |
|
12:30 – 12:45 |
CONCLUDING REMARKS |
Paul Messina
Argonne National Laboratory
Argonne, IL
USA
Wolfgang Gentzsch
The UberCloud
GERMANY
Gerhard Joubert
Technical University Clausthal
GERMANY
Thomas Sterling
Indiana University
Bloomington, IN
USA
Thomas Sterling
Indiana University
Bloomington, IN
USA
Peter Beckman
Argonne National Laboratory
Argonne, IL
USA
Valerio Pascucci
University of Utah
and
Pacific Northwest National
Laboratory
Salt Lake City, UT
USA
Craig A. Lee
The Aerospace Corporation
El Segundo, CA
USA
David Keyes
King Abdullah University of Science and Technology
Thuwal
SAUDI ARABIA
Vladimir Getov
University of Westminster
London
U.K.
What is Capable Exascale
Computing? Chairman:
P. Messina, Argonne National Laboratory, DOE, U.S.A. Participants: P.
Messina (Argonne National Laboratory), D. Keyes (King Abdullah University of
Science and Technology), T. Lippert (Juelich
Supercomputing Centre), S. Matsuoka (Tokyo Institute of Technology), M. Parashar (Rutgers University), T. Sterling (Indiana
University) Exascale
computing that is “capable” must provide more than hardware whose theoretical
peak speed is one or more ExaFLOPS. The panel
participants will provide their viewpoints on the features of a computing
ecosystem that deserves the adjective “capable,” such as a robust software
stack that supports a broad variety of applications, the ability to process
vast data volumes, and reliable, affordable operations. |
The Potential for Deep Learning to Harness
Increasing Flows of Urban Data Chairman:
C. Catlett, Argonne National Laboratory, DOE, U.S.A. Participants:
C. Catlett (Argonne National
Laboratory), J. Lane (New York University) B. Goldstein (University of
Chicago), B. Bhaduri (Oak Ridge National
Laboratory), E. Beinat (University of Salzburg), E.
Cesario (National Research Council of Italy) New
approaches to data analysis, including machine learning and “deep learning”
techniques, are gaining traction in many science areas ranging from
computational biology to advanced manufacturing. Cities—both physical and
human infrastructures and interconnected systems—provide many opportunities
for the use of deep learning, particularly in conjunction with new data from
sensor networks. For instance, could intelligent intersections employ deep
learning to images and video in order to track “near misses” and adjust
traffic signals in real time to improve safety? Concurrently, within cities
are sources of data that are not open, but can be used internally. As an analog to predicting the failure of jet engines based on
leading indicators, could deep learning techniques help cities and urban scientists
to discover models that describe the interdependencies between economics,
public safety, education, and other factors, leading to proactive rather than
reactive urban planning? A fundamental question is what are the opportunities
for exascale computing and deep learning to help to
steer the present “smart city” and “Internet of Things” movements toward
substantive, long-term urban challenges rather than the more common “smart
city” focus areas of engineering and urban mechanics (such as reducing traffic
congestion or improving parking). |
Envisioning Human-in-the-loop Interactions
with Massive Scientific Simulations and Experiments in the Age of Exascale HPC and Big Data Jim
Ahrens Los Alamos National Laboratory, Los Alamos,
NM, USA This talk will describe a vision for interacting
with massive scientific simulations and experiments to better understand
underlying natural physical properties and process. Specific solutions that
take advantage of advances in HPC and Big Data technology will be discussed. |
Exascale System and Node Architectures: The Summit
and Beyond James A.
Ang Exascale Computing Program, Center
for Computing Research Sandia National Laboratories, USA The U.S. Advanced Simulation and Computing (ASC)
program has been applying a co-design methodology for over five years. Our
use of co-design has evolved along a continuum— from the early reactive
approach, to the current proactive methodology, and towards a proposed
transformative path. The goal of transformative co-design is to leverage an
opportunity to develop future hardware and system architecture designs that
are unconstrained by current technology roadmaps1. The HPC
community has been working with proxy applications to represent how our real
HPC applications use advanced architectures. Proxy applications are also a
communication vehicle for helping vendors and system architects understand
how our real applications perform. The Advanced Scientific Computing Research
(ASCR) program has been funding the Computer Architecture Lab (CAL) project
since 2013 to develop a new co-design communication vehicle. CAL is
developing abstract machine models, and their associated proxy architectures
to help the DOE application teams reason about and design their applications
to map into advanced system and node architectures.2 On July 29, 2015, President Obama announced the U.S.
National Strategic Computing Initiative (NSCI). As a part of this
Presidential Directive, the U.S. DOE is launching the Exascale
Computing Project (ECP) with direct support from both the ASC and ASCR
programs. This project has four technical focus areas: Application
Development, Software Technologies, Hardware Technologies, and Exascale Systems. Under Hardware Technology, a new effort
has been launched called PathForward to support
vendor-led system and node designs that offer the opportunity for
“transformative” co-design in which the 2023 exascale
system, node, and component designs are influenced by the needs of ECP
applications and associated system software. The U.S. DOE ECP has a goal of at least two diverse
2023 Exascale Systems. This goal may be viewed as
The Summit. Overall, NSCI has goals that extend Beyond the Summit.3 In my presentation I will address
both ECP goals and NSCI goals from both a technology and programmatic
perspective. 1 http://nnsa.energy.gov/sites/default/files/nnsa/inlinefiles/ASC_Co-design.pdf
, ASC Co-Design Strategy, J.A. Ang, T.T. Hoang,
S.M. Kelly, A. McPherson, R. Neely, SAND 2015-9821R, February 2016. 2 http://www.cal-design.org/publications,
Abstract Machine Models and Proxy Architectures for Exascale
Computing, J.A. Ang, R.F. Barrett, R.E. Benner, D.
Burke, C. Chan, D. Donofrio, S.D. Hammond, K.S. Hemmert, S.M. Kelly, H. Le, V.J. Leung, D.R. Resnick,
A.F. Rodrigues, J. Shalf, D. Stark, D. Unat, N.J. Wright, May 2014. 3 Beyond the Summit, Skinner, Todd,
Penguin Group Inc, New York, NY, USA October 2003. |
Trends in System Architectures: Towards ”The
Machine” and Beyond Frank Baetke HPC in Academia and Scientific Research Hewlett Packard, Palo Alto, CA, USA The talk
will address trends in system architecture for HPC and will include related
aspects of Big Data and IoT. A specific focus will
be on innovative components like next generation memory interconnects,
non-volatile memory and silicon photonics that play a key role in future
system designs. HPE’s “The Machine” will be used to bring those components
into the context of an actual system implementation. Related options and challenges
at the level of system software, middleware and programming paradigms will
also be addressed. |
Collective Sensing and large-scale
predictions: two case studies Euro Beinat University of Salzburg, Austria We use the
term “Collective Sensing” to describe the set of methods and tools used to analyze, describe and predict large-scale human dynamics
based on the growing availability of digital transaction data (telecom,
banking, transportation, sensors, social media). In the recent past an entire
stream of literature has developed within and across disciplines with the aim
of exploiting new data sources and data science methods to provide better
ways to understand the collective behavior of
cities, communities or economic sectors. The presentation positions
collective sensing in this broad ecosystems and then focuses on two case
studies. In the first, we explore the use of online learning algorithms for
predicting the short-term location of entire populations. The method draws
from sequential learning and leverages the history of millions of
individuals, who are used as anonymous “experts” of each other's mobility, to
improve individual predictability. The validation on one year of telecom data
shows that the method significantly exceeds traditional prediction methods,
especially in cases when the data history is short (the case of tourists, for
instance). The second use case focuses on the prediction of road incidents on
the basis of traffic and weather data. It describes data structuring and the
design of three types of neural networks (deep learning, convolutional and
LSTM) used for different types of predictions. The presentation illustrates
the results of the networks after training on a 5-years dataset and how they
outperform purely statistical predictions. |
Landscape Dynamics, Geographic Data and
Scalable Computing: The Oak Ridge Experience Budhendra Bhaduri Urban/GIS Center,
Oak Ridge National Laboratory, Oak Ridge, TN, USA Understanding
change through analysis and visualization of landscape processes often
provide the most effective tool for decision support. Analysis
of disparate and dynamic geographic data provides an effective component of
an information extraction framework for multi-level reasoning, query, and
extraction of geospatial-temporal features. With increasing temporal
resolution of geographic data, there is a compelling motivation to couple the
powerful modeling and analytical capability of a
GIS to perform spatial-temporal analysis and visualization on dynamic data
streams. However,
the challenge in processing large volumes of high resolution earth
observation and simulation data by traditional GIS has been compounded by the
drive towards real-time applications and decision support. Drawing from our
experience at Oak Ridge National Laboratory providing scientific and
programmatic support for federal agencies, this presentation will highlight
progress and challenges of some of the emerging computational approaches,
including algorithms and high performance computing, illustrated with
population and urban dynamics, sustainable energy and mobility, and climate
change science. |
Exascale by Co-Design Architecture Gil
Bloch Mellanox Technologies, Sunnyvale, CA, USA High performance computing has begun scaling beyond
Petaflop performance towards the Exaflop mark. One
of the major concerns throughout the development toward such performance
capability is scalability – at the component level, system level, middleware
and the application level. A Co-Design approach between the development of
the software libraries and the underlying hardware can help to overcome those
scalability issues and to enable a more efficient design approach towards the
Exascale goal. |
HPC clusters as code in the [almost] infinite
cloud Brendan Bouffler Scientific Computing Amazon Web Services,
London, USA HPC clusters
have exploded in capability in the last decade leading to not only
breakthrough discoveries like gravitational waves but also techniques that
allow us to screen compounds for drug suitability and design better
headlights for cars that reduce drag and silence cabin noise. HPC has become
a tool that spans industry, research and education, and yet remains out of
reach for many because owning a cluster often means a significant investment
and complex integration. In the
cloud, however, not owning an HPC cluster can be one of the most productive
ways to compute everything from the fluid dynamics of a milk bottle to the
evolution of the universe, since clusters with specific purposes can be made
available off-the-shelf and can be procured in just the right amounts of
capacity. Coupled with access to lagre public
datasets like those from earth observation satellites or genomic databases
and it’s easy to imagine how HPC can become an even more common tool for
humble and grand workloads alike. In this
talk we’ll discuss the technologies underpinning the AWS cloud, the scale at
which we operate and how we build clusters on the fly, using all the same
tools we’ve come to depend upon, and a few new ones to boot. We’ll show real
examples from customers and partners who’ve broken new ground because of the
agility the cloud offers. |
The Myth of a Converged Software Stack for
HPC and Big Data Ronald
Brightwell Sandia National Laboratories, Albuquerque,
NM, USA The
notion that one operating system or a single software stack will support the
emerging and future needs of the HPC and Big Data communities is a fantasy.
There are many technical and non-technical reasons why functional
partitioning through customized software stacks will continue to persist. Rather
than searching the ends of rainbows for a single software stack that
satisfies a diverse and competing set of requirements, approaches that enable
the use and integration of multiple software stacks should be pursued. This
talk will describe the challenges that motivate the need to support multiple
concurrent software stacks for enabling application composition, more complex
application workflows, and a potentially richer set of usage models for
extreme-scale HPC systems. The Hobbes project led by Sandia National
Laboratories is exploring operating system infrastructure for supporting
multiple concurrent software stacks. This talk will describe this infrastructure
and discuss issues that motivate future exploration. |
A Proposed Exascale
Agenda for Urban Sciences Charlie
Catlett Math & Computer Science Div., Argonne
National Laboratory & Computation Institute of The University
of Chicago, Chicago, IL, USA Urbanization
is one of the great challenges and opportunities of this century,
inextricably tied to global challenges ranging from climate change to
sustainable use of energy and other natural resources, and from personal
health and safety to accelerating innovation in metropolitan communities.
Enabling science- and evidence-based urban design, policy, and operation will
require discovery, characterization, and quantification of the
interdependencies between major metropolitan sectors. Many of these sectors
or systems are modeled individually, but in order
to optimize the design, planned evolution, and operation of cities it is
essential that we quantify and understand how they interact. Coupled
multi-scale models will be essential for this discovery process. We will
discuss the concept of a general framework for such coupled models,
highlighting several example coupled systems as well as the challenges of
integrating data from sensor networks. |
Homogeneous authorization policies in
heterogeneous IAAS clouds David
Chadwick, Carlos Ferraz and Ioram
Sette University of Kent, Canterbury, United
Kingdom How can a
tenant administrator of multiple cloud accounts set and enforce a single
authorisation policy throughout a multi-cloud infrastructure? In this
presentation we will describe the solution we have designed and implemented,
using OpenStack and Amazon clouds as exemplars. We propose a Federated
Authorisation Policy Management Service (FAPManS),
which holds the global authorisation policy in DNF, along with the
authorisation ontology and rules for mapping from the global policy terms to
cloud specific ones. Policy adaptors convert the global policy into local
ones, so that each cloud system keeps its existing authorisation mechanism
without needing to change it. A publish-subscribe mechanism ensures the policies
are synchronized. We will conclude by listing the strengths and weaknesses of
our approach, and where further work still needs to be done. |
Generic Packet Processing Unit a novel way to
implement low cost and efficient FPGA computing Marcello
Coppola STMicroelectronics, Grenoble, FRANCE These heterogeneous systems are now widely used in
many computing markets including consumer, HPC, automotive, Networking. Using
Heterogeneous computing it is possible to improve the performance power tradeoffs compared to the homogeneous solutions. As
matter of the fact these systems can offload computation kernels towards
specific island of computation. This presentation describes the novel
technology, called Generic Packet Processor Unit (GPPU), for offloading (or
“dispatching”) computation kernels from the host processor to the FPGA
computation islands in a more efficient manner. The GPPU infrastructure
(composed by the Generic Packet Processor Unit (GPPU) hardware module and the
AQLSM software runtime) in contrast to traditional based approaches that
require explicit data movement, enables FPGA computing islands to operate on
the same virtual address space of the host processor. In addition during the presentation,
we show how the GPPU allows the Host processor to schedule work to the target
FPGA in a smart and efficient way by removing the operating system overhead.
This talk will conclude showing how the usage of GPPU into heterogeneous
systems will enhance programmability and further improve latency of critical
operations. |
Semantic Technologies to support Cloud
Applications’ Portability and Interoperability on Multiple Clouds Beniamino
Di Martino Department of Industrial and Information
Engineering University of Naples 2, Italy Cloud
vendor lock-in and interoperability gaps arise (among many reasons) when
semantics of resources and services, and of Application Programming
Interfaces is not shared. Standards and techniques borrowed from SOA and
Semantic Web Services
areas might help in gaining shared, machine readable description of Cloud
offerings (resources, Services at Platform and Application level, and their
API groundings), thus allowing automatic discovery, matchmaking, and thus
supporting selection, brokering, interoperability end composition of Cloud
Services among multiple Clouds. This talk
will in particular illustrate the outcomes of the mOSAIC
EU funded project (http://www.mosaic-cloud.eu): a Cloud Ontology, a Semantic
Engine, a Dynamic Semantic Discovery System, and a uniform semantic
representation of Cloud Services and Cloud Patterns, agnostic and vendor
specific. |
An Overview of HPC and the Changing Rules at Exascale Jack Dongarra Innovative Computing Laboratory, Computer
Science Dept. University of Tennessee, Knoxville, IL, USA In this talk
we will look at the current state of high performance computing and look at
the next stage of extreme computing. With extreme computing there will be
fundamental changes in the character of floating point arithmetic and data
movement. In this talk we will look at how extreme scale computing has caused
algorithm and software developers to changed their way of thinking on how to
implement and program certain applications. |
Preparing Applications for Next Generation
Architectures Sudip S.
Dosanjh National Energy Research Scientific Computing
Center Lawrence Berkeley National Laboratory,
Berkeley, CA, USA NERSC’s
primary mission is to accelerate scientific discovery at the DOE Office of
Science through high performance computing and data analysis. NERSC supports
the largest and most diverse research community of any computing facility
within the DOE complex, providing large-scale, state-of-the-art computing for
DOE’S unclassified research programs in alternative energy sources, climate
change, environmental science, materials research, astrophysics and other
science areas related to DOE’s science mission. NERSC’s
next supercomputer, Cori, is being deployed in 2016 in Berkeley Laboratory’s
new Computational Research and Theory (CRT) Facility. Cori will include over
9300 manycore Intel Knight’s Landing processors,
which introduce several technological advances, including higher intra-node
parallelism; high-bandwidth, on-package memory; and longer hardware vector
lengths. These enhanced features are expected to yield significant
performance improvements for applications running on Cori. In order to take
advantage of the new features, however, application developers will need to
make code modifications because many of today’s applications are not
optimized to take advantage of the manycore
architecture and on-package memory. To help
users transition to the new architecture, in 2014 NERSC established the NERSC
Exascale Scientific Applications Program (NESAP).
Through NESAP, several code projects are collaborating with NERSC, Cray and
Intel with access to early hardware, special training and “deep dive”
sessions with Intel and Cray staff. Eight of the chosen projects also will be
working with a postdoctoral researcher to investigate computational science
issues associated with manycore systems. The NESAP
projects span a range of scientific fields—including astrophysics, genomics,
materials science, climate and weather modeling,
plasma fusion physics and accelerator science—and represent a significant
portion of NERSC’s current and projected computational workload. Cori will
include many enhancements to enable a rapidly growing extreme data science
workload at NERSC. Cori will have a 1600 Intel® Haswell processor partition
with larger memory nodes to enable extreme data analysis. A fast internet
connection will let users stream data from experimental and observational
facilities directly into the system. A “Burst Buffer”, a 1.5 Petabyte layer
of NVRAM, will help accelerate I/O. Cori will also include a number of
software enhancements to enable complex workflows. For the
longer term we are investigating whether a single system can meet the
simulation and data analysis requirements of our users. For example, we are
adding a genome assembly miniapp (Meraculous) to our benchmark suite and we are considering
adding one for genome alignment (Blast). We are also investigating how data
intensive workflows (e.g., cosmology and genomics) differ from our simulation
workloads. |
Accelerating discovery with science services Ian
Foster Math & Computer Science Div., Argonne
National Laboratory & Dept of
Computer Science, The University of Chicago, Chicago, IL, USA Ever more
data- and compute-intensive science makes computing increasingly important
for research. But for the benefits of advanced computing to accrue to more
than the scientific 1%, we need new delivery methods that slash access costs
and new platform capabilities to accelerate the development of interoperable
tools and services. In this talk, I describe a set of such new methods and
report on experiences with their development and application. Specifically, I
describe how software-as-a-service methods can be used to move complex and
time-consuming research IT tasks out of the lab and into the cloud, thus
greatly reducing the expertise and resources required to use them. I also
describe how a new class of platform services can accelerate the development
and use of an integrated ecosystem of advanced science applications, thus
enabling access to powerful data and compute resources by many more people
than is possible today. |
Application and Software Classifications that
motivate Big Data and Big Simulation Convergence Geoffrey
Fox Community Grid Computing Laboratory, Indiana
University, Bloomington, IN, USA We
combine NAS Parallel Benchmarks, Berkeley Dwarfs, the Computational Giants of
NRC Massive Data Analysis Report and the NIST Big Data use cases to get an
application classification -- the convergence diamonds that links Big Data
and Big Simulation in a unified framework. We combine this with High
Performance Computing enhanced Apache Big Data software Stack HPC-ABDS and
suggest a simple approach to computing systems that support data management,
analytics, visualization and simulations without sacrificing performance. We
describe a set of "software defined" application exemplars using an
Ansible DevOps tool that we are producing. |
Implementing parts of HPC-ABDS in a
multi-disciplinary collaboration Geoffrey
Fox Community Grid Computing Laboratory, Indiana
University, Bloomington, IN, USA We introduce
the High Performance Computing enhanced Apache Big Data software Stack
HPC-ABDS and give several examples of advantageously linking HPC and ABDS. In
particular we discuss a Scalable Parallel Interoperable Data Analytics
Library SPIDAL that is being developed to embody these ideas and is the
HPC-ABDS instantiation of well known Apache
libraries Mahout and MLlib. SPIDAL covers some core
machine learning, image processing, graph, simulation data analysis and
network science kernels. It is a collaboration between teams at Arizona,
Emory, Indiana (lead), Kansas, Rutgers, Virginia Tech, and Utah universities. We give
examples of data analytics running on HPC systems including details on
persuading Java to run fast. |
Toward Democratization of HPC with Novel
Software Containers Wolfgang
Gentzsch The UberCloud,
Germany Countless
case studies demonstrate impressively the importance of High Performance
Computing (HPC) for engineering insight, product innovation, and market
competitiveness. But so far HPC was mostly in the hands of a relatively small
elite crowd, not easily accessible by the large majority of scientists and
engineers. In this presentation we argue that – despite the ever increasing
complexity of HPC tools, hardware, and system components – engineers have
never been this close to ubiquitous HPC, as a common tool, for everyone. The
main reason for this next progress can be seen in the continuous advance of
HPC software tools which assist enormously in the design, development, and
optimization of manufacturing products and scientific research. Now, we
believe that the next big step towards ubiquitous HPC will be made very soon
with new software container technology which will dramatically facilitate
software packageability and portability, ease the
access and use, and simplify software maintenance and support, and which
finally will pass HPC into the hands of every engineer. During
the past two years UberCloud has successfully built
HPC containers for application software from ANSYS, CD-adapco
STAR-CCM+, COMSOL Multiphysics, NICE DCV, Numeca
FINE/Marine and FINE/Turbo, OpenFOAM, PSPP, Red
Cedar HEEDS, Scilab, Gromacs,
and more. These application containers are now running on cloud resources
from Advania, Amazon AWS, CPU 24/7, Microsoft
Azure, Nephoscale, OzenCloud,
and others. In this presentation we will present the concept and benefits of
these novel software containers for engineering and scientific application,
and present a live and interactive demo with an engineering application in a
cloud container. |
Application-Specific Energy Modeling of Multi-Core Processors Vladimir
Getov Department of Engineering, Faculty of Science
and Technology University of Westminster, London, United
Kingdom During
the last decade, further developments of computer architecture and
microprocessor hardware have been hitting the so-called “energy wall” because
of their excessive demands for more energy. Subsequently, we have been
ushering in a new era with electric power and temperature as the primary
concerns for scalable computing. For several years, reducing significantly
the energy consumption for data processing and movement has been the most
important challenge towards achieving higher computer performance at exascale level and beyond. This is a very difficult and
complex problem which requires revolutionary disruptive methods with a
stronger integration among hardware features, system software and applications.
Equally important are the capabilities for fine-grained spatial and temporal
instrumentation, measurement and optimization, in order to facilitate
energy-efficient computing across all layers of current and future computer
systems. Moreover, the interplay between power, temperature and performance
add another layer of complexity to this already difficult group of
challenges. Existing
approaches for energy efficient computing rely heavily on power efficient
hardware in isolation which is far from acceptable for the emerging
challenges. Furthermore, hardware techniques, like dynamic voltage and
frequency scaling, are often limited by their granularity (very coarse power
management) or by their scope (a very limited system view). More
specifically, recent developments of multi-core processors recognize energy
monitoring and tuning as one of the main challenges towards achieving higher
performance, given the growing power and temperature constraints. To address
these challenges, one needs both suitable energy abstraction and
corresponding instrumentation which are amongst the core topics of ongoing
research and development work. Since
current methodologies and tools are limited by hardware capabilities and
their lack of information about the application code, a promising approach is
to consider together the characteristics of both the processor and the
application-specific workload. Indeed, it is pivotal for hardware to expose
mechanisms for optimizing dynamically consumed power and thermal energy for
various workloads and for reducing data motion, a major component of energy
use. Therefore, our abstract model is based on application-specific
parameters such as power consumption, execution time, and equilibrium
temperature as well as hardware-specific parameters such as half time for
thermal rise or fall. Building upon this recent work, the ongoing and future
research efforts involve the development of a novel tuning methodology and
the evaluation of its advantages on real use cases. Experimental results
demonstrate the efficient use of the model for analyzing
and improving significantly the application-specific balance between power,
temperature and performance. |
The New Code of Ethics: Justice and
Transparency in the Age of Big Data and Deep Learning Brett
Goldstein University of Chicago, USA As the
world is increasingly shaped by algorithms and machine learning, the
underlying code and data also reflects the world around it — including the
biases and discrimination that are embedded in society. Brett
Goldstein explores the the broader implications of
the black box techniques and methods that are prevalent in data-driven
decision making in fields from law enforcement to product marketing, with a
particular focus on the implications of streaming sensor data and the
internet of things. Drawing on experiences from his time as the Chief Data
Officer for the City of Chicago and pioneering predictive crime analytics
with the Chicago Police Department, he suggests a framework for thinking
about these challenges based on transparency, awareness, and minimizing bias
rather than seeking to eliminate it completely. He outlines a vision for
proactive management of the concerns that come in an high-performance computing
and algorithm-rich environment. |
Using Modern C++ with Multi-Staging for
Unified Programming of GPU Systems Sergei
Gorlatch Universitaet Muenster, Institut
für Informatik, Muenster,
Germany Writing and
optimizing programs on systems with Graphics Processing Units (GPUs) remains
a challenging task even for expert programmers. We present PACXX -- our approach to GPU
programming using exclusively C++, with the convenient features of modern
C++14 standard: type deduction, lambda expressions, and algorithms from the
standard template library (STL). Using PACXX, a GPU program is written as a
single C++ program, rather than two distinct host and kernel programs as in
OpenCL or CUDA. We extend PACXX with an easy-to-use and type-safe API for
multi-stage programming that allows for optimizations during code generation.
Using just-in-time compilation techniques, PACXX generates efficient GPU code
at runtime. Our evaluation shows that using PACXX allows
for writing GPU code easier and safer than currently possible in CUDA or
OpenCL, and that multi-stage programs can significantly outperform equivalent
non-staged versions. Furthermore, we show that PACXX generates code with high
performance, comparable to industrial-strength OpenCL compilers. |
Progress in automatic GPU compilation and why
you want to run MPI on your GPU Torsten Hoefler Scalable Parallel Computing Lab., Computer
Science Department ETH Zurich, Zurich, SWITZERLAND Auto-parallelization of programs that have not been
developed with parallelism in mind is one of the holy grails in computer
science. It requires understanding the source code's data
flow to automatically distribute the data, parallelize the computations, and
infer synchronizations where necessary. We will discuss our new LLVM-based
research compiler Polly-ACC that enables automatic compilation to accelerator
devices such as GPUs. Unfortunately, its applicability is limited to codes
for which the iteration space and all accesses can be described as affine
functions. In the second part of the talk, we will discuss dCUDA, a way to express parallel codes in MPI-RMA, a
well-known communication library, to map them automatically to GPU clusters.
The dCUDA approach enables simple and portable
programming across heterogeneous devices due to programmer-specified
locality. Furthermore, dCUDA enables
hardware-supported overlap of computation and communication and is applicable
to next-generation technologies such as NVLINK. We will demonstrate
encouraging initial results and show limitations of current devices in order
to start a discussion. |
Big Data Analytics on Vector Processor Takeo Hosomi System Platform Research Laboratories, NEC,
Kanagawa, JAPAN Almost every industry start utilizing the power of
Big Data analytics. And IoT enables to apply those
Big Data analytics to the physical systems by collecting sensor data form the
systems. In those analytics, there are a lot of compute-intensive
applications, such as image and video processing, data mining, and
simulations. NEC has developed SX series vector supercomputers
for HPC markets. And is developing a next generation vector processor for
both HPC and Big Data applications. In this talk, I will show some performance
evaluation results of analytics applications on the current system. And I
also present a concept of NEC’s next generation vector supercomputer called
‘Aurora’. |
Big Data and The Internet of Important Things Carl Kesselman Information Sciences Institute, University of
Southern California Marina del Rey, Los Angeles, CA, USA In the early days of Grid computing, the integration
of high-volume data producing instruments, such as the Advanced Photon Source
into an integrated computational pipeline was investigated and prototype
systems were produced that combined data production with data analysis. Now,
with the emergence of diverse types of scientific instruments and cloud
computing, it is time to revisit these combined observational and
computational systems. In this talk we consider an “internet of important
things” which consists of many diverse network connected instruments that are
dynamically interconnected and produce potentially large amounts of data that
must be securely and reliable obtained, combined, analyzed
and disseminated. I will describe some of the basic problems that must be
addressed, and provide an overview of a platform for managing the acquisition
and management of these data. |
CFD Codes on Multicore and Manycore Architectures David
Keyes King Abdullah University of Science and
Technology, Thuwal, Saudi Arabia Weak
scaling over distributed memory is well established for structured and
unstructured CFD simulations, as evidenced by (among other achievements of
the CFD community) Gordon Bell Prizes over the decades. Strong scaling over
shared memory to exploit multicore and manycore
architectures is less satisfactory to date. In this talk, we report on
separate campaigns to port to Intel multicore and manycore
environments successors to a pair of CFD codes that shared the 1999 Gordon
Bell Prize. Shared memory parallelization of the flux kernel of PETSc-FUN3D,
an unstructured tetrahedral mesh Euler flow code is evaluated on Ivy Bridge,
Haswell, and KNC. We explore several thread-level optimizations to improve
flux kernel performance. In addition, a geometry-simplified fork of a widely
employed spectral element Navier–Stokes code,
Nek5000, has been co-designed with many algorithmic and implementation
innovations for the multicore and manycore
environments, using very high order elements, resolving duct flow at a
record-high Reynolds number 100,000. We emphasize features of the
computations at the algorithm-architecture interface. |
Borrowing Concepts from Social Media to
Enable Integration of Large-Scale Sensitive Data Sets Julia
Lane Wagner School, Center
for Urban Science and Progress New York University, New York, NY, USA Enabling
access to data is a fundamental first step to deriving value from their
content. That principle has been the driving force for much of the Open Data
movement across national and city governments, and resulted in a great deal
of citizen engagement. Access to large scale and sensitive data on human
beings is much more restricted. As a result there is much less understanding
of the availability and content of the datasets. This presentation describes
ways to improve that understanding that apply gamification approaches used in
social media. It provides a concrete example based on work with federal and
city administrative data. |
Update on a Keystone-based General Federation
Agent Craig
Lee Computer Systems
Research Dept., The Aerospace Corporation, El Segundo, CA USA Cloud federation
is an instance of general federation. That is to say, managing a federation
of cloud services is essentially no different from managing a set of
arbitrary, application-level services. While Keystone (the OpenStack security
service) was built to manage access to a set of local OpenStack cloud
services, it is actually quite amendable to managing arbitrary federations.
This talk presents our current work on using Keystone as a general federation
agent for arbitrary federations. |
Convergence of HPC and Bigdata Yutong Lu School of Computer Science National University of Defense
Technology China Nowadays,
advanced computing and visualizing tools are facilitating scientists and
engineers to perform virtual experiments and analyze
large-scale datasets. Computing-driven and Bigdata-driven
scientific discovery has become a necessary approach in global environment,
life science, nano-materials, high energy physics
and other fields. Furthermore, the fast increasing computing requirements
from economic and social development also call for the birth of the Exascale system. This talk will discuss the convergence
of the HPC and Bigdata on Tianhe2 system. |
Towards a Continuous Description of Compute
and Idle Phases in Scientific Parallel Applications Stefano Markidis KTH Royal Institute of Technology, Stockholm,
Sweden Parallel
scientific applications distribute their workload among several processes
that compute, communicate to and synchronize with other processes. Compute
and idle phases alternate in parallel applications. Idle periods on a process
are often generated by other late processes from which data is expected to be
received. In this
talk, we show that a scientific application can be considered as a continuous
medium in the limit of a very large number of processes as on current petascale and future exascale
supercomputers. Such a medium supports the propagation of idle periods among
processes in the same way air supports acoustic waves. We formulate an
equation to characterize the idle period propagation in parallel
applications. Its characteristic propagation velocity is determined by both
network parameters (latency and bandwidth) and application characteristics
(average compute time). This work
poses the basis for understanding how local process imbalance impacts
globally the overall application performance. It highlights the implication
of point-to-point communication among processes. We suggest that many cases
of unexpected performance degradation of parallel scientific applications can
be explained in terms of propagating idle periods. |
Consumable Analytics for Big Data Patrick
Martin School of Computing, Queen’s University,
Kingston, Ontario, Canada Consumable
analytics attempt to address the shortage of skilled data analysts in many organizations
by offering analytic functionality in a form more familiar to employees.
Providing consumable analytics for big data faces three main challenges: the
large volumes of data require efficient distributed algorithms; the analytics
must offer an easy interface to allow in-house experts to use these
algorithms while minimizing the learning cycle and existing code rewrites,
and the analytics must work on data of different formats stored on
heterogeneous data stores. The talk
will explore the above challenges and will give an overview of QDrill, which is a system we are developing to provide
consumable analytics for big data. QDrill extends
Apache Drill, a schema-free SQL query engine for cloud storage, in two ways.
The first is the inclusion of facilities to integrate any sequential
single-node data mining library into Drill and run its algorithms in a
distributed fashion from within the Drill SQL statements. The second is the
Distributed Analytics Query Language (DAQL), which provides users with a familiar
SQL interface to train and use analytical models. |
From FLOPS to BYTES: Distruptive
End of Moore’s Law beyond Exascale Satoshi
Matsuoka Global Scientific Information and Computing Center & Department of Mathematical and
Computing Sciences Tokyo Institute of Technology, Japan The
so-called “Moore’s Law”, by which the performance of the processors will
increase exponentially by factor of 4 every 3 years or so, is slated to be
ending in 10-15 year timeframe due to the lithography of VLSIs reaching its
limits around that time, and combined with other physical factors. This is
largely due to the transistor power becoming largely constant, and as a
result, means to sustain continuous performance increase must be sought
otherwise than increasing the clock rate or the number of floating point
units in the chips, i.e., increase in the FLOPS. The promising new parameter
in place of the transistor count is the perceived increase in the capacity
and bandwidth of storage, driven by device, architectural, as well as
packaging innovations: DRAM-alternative Non-Volatile Memory (NVM) devices,
3-D memory and logic stacking evolving from VIAs to direct silicone stacking,
as well as next-generation terabit optics and networks. The overall effect of
this is that, the trend to increase the computational intensity as advocated
today will no longer result in performance increase, but rather, exploiting
the memory and bandwidth capacities will instead be the right methodology.
However, such shift in compute-vs-data tradeoffs
would not exactly be return to the old vector days, since other physical
factors such as latency will not change when spatial communication is
involved in X-Y directions. Such conversion of performance metrics from FLOPS
to BYTES could lead to disruptive alterations on how the computing system,
both hardware and software, would be evolving towards the future. |
Topology, Application and User Behavior Aware Job Resource Management in
Multidimensional Torus-Based HPC Systems Jarek Nabrzyski Department of Computer Science and
Engineering, University of Notre Dame & Center for
Research Computing & Great Lakes Consortium for Petascale Computation Notre Dame, Indiana, USA Communication
networks in recent machines often have multidimensional torus topologies,
which influences the way jobs should be schedule in the system. This is the
case of such systems as Cray XE/XK with 3D torus, BlueGene/Q
with 5D, and the K computer with 6D torus. For example, BlueGene
allows allocating network links exclusively to the selected jobs to optimize
their performance, but it can leave unused nodes within the system
partitions, which leads to a lower system utilization. On Blue Waters, in
order to improve the application performance and runtime consistency, the system
adopts a contiguous allocation strategy. Each job is allocated a convex
prism, which reduces job-to-job interference and improves job performance,
but it degrades the system utilization. On the other hand, pure
non-contiguous allocation causes job performance to go down due to
communication interference and increased latency, which can lead to a
substantial variability of runtime. These reasons motivate our research in
which we develop and investigate various topology-aware scheduling algorithms
and strategies for the mapping of application to machine. New scheduling
methods that take an advantage of job and system topology, user behavior and application communication patterns will be
presented. Experimental results based on Blue Waters traces will demonstrate
the need for such new schedulers. |
High Performance Analytics Services and
Infrastructures for Addressing Global Changes: the GEOSS perspective Stefano Nativi National Resarch
Council of Italy, Italy The
presentation deals with the role of High Performance Analytics services and
the related Infrastructures to empower advanced platforms dedicated to the
Earth System Science and in particular to study the Global Changes effects.
The GEOSS (Global Earth Observation System of System) experience and
viewpoint will be discussed. |
Big Data Challenges in Simulation-based
Science Manish Parashar Dept. of Computer Science, Rutgers
University, Piscataway, NJ, USA Data-related
challenges are quickly dominating computational and data-enabled sciences,
and are limiting the potential impact of scientific applications enabled by
current and emerging extreme scale, high-performance computing environments.
These data-intensive application workflows involve dynamic coordination,
interactions and data coupling between multiple application processes that
run at scale on different resources, and with services for monitoring,
analysis and visualization and archiving, and present challenges due to
increasing data volumes and complex data-coupling patterns, system energy
constraints, increasing failure rates, etc. In this talk I will explore data
grand challenges in simulation-based science and investigate how solutions
based on data sharing abstractions, managed data pipelines, in-memory
data-staging, in-situ placement and execution, and in-transit data processing
can be used to address these data challenges at extreme scales. |
Extreme Data Management Analysis and
Visualization for Exascale Supercomputers Valerio
Pascucci University of Utah, Center
for Extreme Data Management, Analysis and Visualization, Scientific Computing
and Imaging Institute School of Computing & Pacific Northwest National Laboratory,
USA Effective
use of data management techniques for analysis and visualization of massive
scientific data is a crucial ingredient for the success of any supercomputing
center and cyberinfrastructure for data-intensive
scientific investigation. In the progress towards exascale
computing, the data movement challenges have fostered innovation leading to
complex streaming workflows that take advantage of any data processing
opportunity arising while the data is in motion. In this
talk I will present a number of techniques developed at the Center for Extreme Data Management Analysis and
Visualization (CEDMAV) that allow to build a scalable data movement
infrastructure for fast I/O while organizing the data in a way that makes it
immediately accessible for analytics and visualization. In addition, I will
present a topological analytics framework that allows processing data in-situ
and achieve massive data reductions while maintaining the ability to explore
the full parameter space for feature selection. Overall,
this leads to a flexible data streaming workflow that allows working with
massive simulation models without compromising the interactive nature of the
exploratory process that is characteristic of the most effective data
analytics and visualization environment. |
Convergence of Memory and Computing Stephen Pawlowksi Advanced Computing Solutions, Micron
Technology, Portland, OR, USA This talk will discuss the current trends of next
generation memory development. Based on this development, memory and logic
are becoming more dependent on each other for correct operation. Given this
linkage, opportunities for driving computing near memory and computing in
memory are once again starting to emerge and be realized. |
Quantum Annealing and the Satisfiability
Problem Kristen Pudenz Quantum Applications Engineering, Lockheed
Martin, Fort Worth, TX, USA The
utility of satisfiability (SAT) as an application focused hard computational
problem is well established. We explore the potential of quantum annealing to
enhance classical SAT solving, especially where sampling from the space of
all possible solutions is of interest. We address the formulation of SAT
problems to make them suitable for commercial quantum annealers,
practical concerns in their implementation, and how the performance of the
resulting quantum solver compares to and complements classical SAT solvers. |
Convergence of HPC and Clouds for Large-Scale
Data Enabled Science Judy Qiu School of Informatics and Computing & Pervasive Technology Institute, Indiana
University, USA Scientific
discovery via advances in computational science and data analytics is an
ongoing national priority. A corresponding challenge is to sustain the
research, development and deployment of the High Performance Computing (HPC)
infrastructure needed to enable those discoveries. Early cloud data centers are being redesigned with new technologies to
better support massive data analytics and machine learning. Programming
models and tools are one point of divergence between the scientific computing
and big data ecosystems. Maturing of Cloud software around the Apache Big
Data Stack (ABDS) has gained striking community support while there is
continued progress in HPCspanning up to exascale. Analysis of Big Data use cases identifies the
need forHPC technologies in ABDS. Deep learning,
using GPU clusters, is a clear example. But many machine learning algorithms
also need iteration, high performance communication and other HPC
optimizations. This rapid change in technology has further implications for
research and education. Our
research has concentrated on runtime and data management to supportHPC-ABDS, evolving from standalone systems to
modules that can be used within existing software ecosystems. This work has
been driven by applications from bioinformatics, computer vision, network
science and analysis of simulations. We show promising results from this
approach of reusing HPC-ABDS to enhance three well-known Apache systems
(Hadoop, Storm andHBase) and construct what I
termed Data-Enabled Discovery Environments forScience and Engineering (DEDESE). Our
architecture is based on using Map-Collective and Map-Streaming computation
models for an integrated solution to handle large data size, complexity and
speed. This is illustrated by our Harp plug-in for Hadoop, which can run
K-means, Graph Layout, and Multidimensional Scaling algorithms with realistic
application datasets over 4096 cores on the IU Big Red II Supercomputer and
Intel’s Xeon architectures while achieving linear speedup. Future goals
include an efficient data analysis library where we are already looking at
Latent Dirichlet Allocation topic model onwikipeida data and subgraph isomorphism algorithms on
networks. Our preliminary results show that model data parallelism extends
our understanding of distributed and parallel computation to further
advancements in handling high dimensional model data and speed of
convergence. This findings will hopefully increase interest in using HPC
machines for Big Data problems and we will continue to collaborate with national
centers in exploring the computational capabilities
and their scientific applications. Short Bio: Judy Qiu
is an assistant professor of Computer Science at Indiana University. Her
general area of research is in data-intensive computing at the intersection
of Cloud and HPC multicore technologies. This includes a specialization on
programming models that support iterative computation, ranging from storage
to analysis which can scalably execute data
intensive applications. Her research has been funded by NSF, NIH, Microsoft,
Google, Intel and Indiana University. Judy Qiu
leads a new Intel Parallel Computing Center (IPCC)
site at IU. She is the recipient of a NSF CAREER Award in 2012, Indiana
University Trustees Award for Teaching Excellence in 2013-2014, and Indiana
University Outstanding Junior Faculty Award in 2015. |
Lattice Boltzmann methods on the way to exa-scale Ulrich Ruede Lehrstuhl fuer Simulation, Universitaet Erlangen-Nuernberg,
Germany Lattice Boltzman methods have become popular as an alternative
method to simulate complex flows. One of their strengths is in simulating
multiphase systems such as bubbly flows or foams. When
coupled with methods to model granular objects, the LBM can also be used to
simulate fluids with a suspended particulate phase. In this
talk we will report on our scalable, adaptive LBM implementation that can reach up
to a trillion (10^12) fluid cells on current Peta-Scale supercomputers. The practical
relevance of these methods will be illustrated with simulations of an
additive manufacturing (3D-printing) process. |
Role of Optical Interconnects in Extreme
Scale Computing Sébastien Rumley Lightwave Research Laboratory, Department of
Electrical Engineering School of Engineering and Applied Science,
Columbia University, USA Improving
interconnect performance is one of the key part of meeting the Exascale challenge. With the core count heading toward
the billion, and the node count teasing the hundred thousand mark, the amount
of data offloaded onto the interconnect every second is massive. There is a
high risk of seeing an even more massive amount of energy dissipated by the
interconnect if current technology is simply scaled. Furthermore, besides the
energy constraint, multi-core packages will eventually see their off-chip
bandwidth limited by pin-out limitations, which may cause a dramatic decrease
in computing efficiency. Photonic
technologies are among the best placed to alleviate these limitations. Meter
scale communications in current supercomputer rely on optical cables already,
yet substantial progresses can be achieved to better integrate these links
within the compute node structure. Hence, CMOS compatible Silicon Photonics
optical devices can typically be integrated along with the cores and caches
among the same die. Such a deeper integration will not only curb the energy
dissipations. It may also trigger important changes in node architectures. In
particular, the organization of memory hierarchies might get totally reviewed
if very large amounts of memory, not necessarily co-packaged with the main
compute die, can be accessed in a high-bandwidth and energy efficient way. Next to
improved integration of photonic end-points, transparent photonic switching
can also be leveraged to increase interconnect flexibility at low cost, or to
offload regular electronic packet routers. Yet photonic switching is subject
to very distinct rules and constraints, and its insertion in very large scale
architectures must be carefully engineered to be advantageous. In this
talk, we review the prospects of integrated Photonics. Main figure of merits
of future on-chip embarked optical transceivers and of optical switches will
be introduced. In light of these results, potential upcoming changes in
interconnect and node architectures will be sketched and discussed. |
Lawrence Berkeley National Laboratory,
Computing Research Division & National Energy Research Supercomputing
Center, USA We are facing a possible end to Moore’s Law in the
coming decade as photolithography reaches close to atomic scales —
challenging future technology scaling for computing systems. Already the 10nm
is under-performing and 7-5nm technology nodes may be delayed indefinitely due
to lack of a compelling performance or economic advantage. Specialization is
one of the additional tools in our toolbox to further increase energy
efficiency in the face of flagging technology scaling improvements. However,
custom hardware (even using FPGAs) has prohibitively high design and
verification costs that have kept it in the margins for decades. The design
costs for digital logic MUST be brought down dramatically in order for
specialization to be cost-effective and agile. Recent advances in Domain-Specific-Languages for
hardware generation and other Hardware Description Languages (HDLs) have
reduced the barriers to hardware design. Whereas a new processor core has
typically taken hundreds of engineers several years to design and verify,
HDLs like Chisel has enabled a small team of engineers over 12 months to
implement a family of RISC-V processor cores that offer performance and
energy efficiency and area efficiency that is competitive or even superior to
commercial offerings. ‘This demonstrate the power of these emerging HDLs and
the promise of an open-hardware ecosystem that is built upon this
infrastructure. We have embarked upon a program to create all of the
basic element required for an open-source many-core chip architecture. OpenSOC uses the Chisel HDL to automate generation of
large scale on-chip Network-on-Chip that integrates together processor cores,
memory controllers, and other peripherals and specialized accelerators into a integrated System-on-Chip package. This approach takes
the first steps towards an open ecosystem for hardware design and
specialization and provide the underpinnings for another decade or more of
technology scaling without the benefit of lithographic improvements. |
Exascale will be successful by 2025... .
. . and then what? Lawrence Berkeley National Laboratory,
Computing Research Division & National Energy Research Supercomputing
Center, USA The US exascale project has made dramatic strides forward in
organizing a more detailed plan for a holistic strategy for delivering a
productive exasacle system by 2023. Many challenges
remain to deliver useful application performance and the scientific
breakthroughs that justify the investment, but the roadmap has become much
clearer. The investment in exascale, however, is
not the end of the line. It is meant to deliver a “first of kind system” (not
just a one-off technical achievement). What happens AFTER the program ends in
2025? This talk
provides an updated view of what a 2023 system might look like and the
challenges ahead, based on our most recent understanding of technology
roadmaps. It also will discuss the tapering of historical improvements in
lithography that coincide with the completion of the exascale
project and how that might affect the roadmap beyond 2025. What options are
available to continue scaling of successors to the first exascale
machine. Will 2025 see a first-of-kind system, or will it arrive just in time
for a new computing technology upheaval? |
Co-design 3.0 – Configurable Extreme
Computing leveraging Moore’s Law for Real Applications Sadasivan Shankar Harvard University, School of Engineering and
Applied Sciences Cambridge, MA, USA In this
talk, we will discuss Co-design 3.0, a more adaptable and scalable paradigm
in which systems can be dynamically configured driven by the specific needs
of the applications. The premise is that with the slowing of Moore’s law and
the power of computing to solve problems for addressing societal needs, we
need to focus on real applications as they evolve in time compared to
standard benchmarks. For this to be practically viable, this should be done
in a scalable framework for lower costs. We think that major ongoing research
and development centers of computational and
physical sciences need to be formally engaged in the co-design of hardware,
software, numerical methods, algorithms, and applications. As we will demonstrate
with a few examples, this will help address grand scientific (technology)
challenges associated with the societal problems: materials and chemistry
(energy); biology (environment, health); information processing (computing
and communication). In addition, this will help in wider dispersion of the
benefits of computing rather than to niche scientific communities. In order
to accomplish this, it is likely that the computing framework that is
currently being used may be replaced by different information processing
architectures. As part of this talk, we will address the key applications and
their needs in these areas and illustrate a new class that we have developed
in which students are taught hands-on about using extreme computing to
address real applications. |
The Challenges of Exascale Computing Karl
Solchenbach Intel, Exascale
Labs Europe, GERMANY Building, operating and using exascale
systems requires the solution of several challenges: ×
The performance/energy ratio has to improve
by an order of magnitude ×
A new memory architecture is needed ×
Applications have to become highly scalable,
supporting 1M+ cores The presentation will address Intel’s efforts in
future HPC system architectures, including many-core nodes, high-bandwidth
interconnects, and new memory concepts. A special focus will be on
programming models and applications and the related work in the Intel Exascale Labs in Europe. In collaborations with leading
European HPC organisations Intel is establishing a co-design process to
define the requirements for future HPC systems and to evaluate future system
architectures. These systems won’t be pure number crunchers any more, they
will solve problems as a mix of HPC, high performance analytics, and
data-centric computing. |
The Asymptotic Computer – Undoing the Damage Thomas
Sterling School of Informatics and Computing & CREST Center
for Research in Extreme Scale Technologies Indiana University, Bloomington, IN, USA While the
very far future well beyond exaflops computing may
encompass such paradigm shifts as quantum computing or Neuromorphic
computing, a critical window of change exists within the domain of
semiconductor digital logic technology. As key parameters such as Dennard
scaling, nano-scale component densities, clock
rates, pin I/O, and voltage represent asymptotic operational regimes, one
major area of untapped opportunity is computer architecture which has been
severely limited by conventional practices of organization and control
semantics. Mainstream computer architecture in HPC has been inhibited in
innovation by the original von Neumann architecture of seven decades ago.
Although notably diverse in forms of parallelism exploited, six major epochs
of computer architecture through to the present are all von Neumann
derivatives. At their core (no pun intended) is the use of single instruction
issue and the prioritization of Floating Point ALU (FPU) utilization. At one
time, floating point hardware was the precious resource for which all
architecture advances were motivated such as ILP, speculative execution,
prefetching, cache hierarchies, TLBs, branch prediction, execution
pipelining, and other architecture techniques. However, in the modern age,
FPUs consume only a small part of die real estate while the plethora of
mechanisms (including caches) to achieve maximum floating point efficiency
take up the vast majority of the chip. In the meantime, the von Neumann bottleneck,
the separation of memory and processor, is retained. A revolution in computer
architecture design is possible, even at the end of Moore’s Law, by undoing
the damage of the von Neumann heritage and emphasizing the key challenges of
data movement latency and bandwidth which are the true precious resources
along with operation/instruction issue control. This presentation will
discuss the key tradeoffs that should drive
computer architecture in what might be called the “Neo-Digital Age” and will
give three stages of advances that are practical even in today’s technology.
These include the author’s own work in ParalleX
architecture, the latent opportunity of Processor-in-Memory architectures,
and the adoption of future cellular architectures, each of which relaxes the
von Neumann architecture assumptions and exploits the inherent opportunities
of future computer architectures. |
DOE-NCI Joint Development of Advanced
Computing Solutions for Cancer Rick Stevens Argonne National Laboratory & Department of Computer Science, The
University of Chicago, USA The U.S.
has recently embarked on an “all government” approach to the problem of
cancer. This is codified in the “Cancer Moonshot”
initiative of the Obama administration led by Vice President Biden. In this
approach, all U.S. Government Agencies were requested to propose ways to
bring their resources and capabilities to forward cancer research. As part of
this initiative, the Department of Energy (DOE) has entered into a
partnership with the National Cancer Institute (NCI) of the National
Institutes of Health (NIH). This partnership has identified three key
challenges that the combined resources of DOE and NCI can accelerate. The
first challenge is to understand the molecular basis of key protein
interactions in the RAS/RAF pathway that is present in 30% of cancers. The
second challenge is to develop predictive models for drug response that can
be used to optimize pre-clinical drug screening and drive precision medicine
based treatments for cancer patients. The third challenge is to automate the
analysis and extraction of information from millions of cancer patient
records to determine optimal cancer treatment strategies across a range of
patient lifestyles, environmental exposures, cancer types and healthcare
systems. While
each of these three challenges are at different biological scales and have
specific scientific teams collaborating on the data acquisition, data
analysis, model formulation, and runs of scientific simulations, they also
share several common threads. First, they are all linked by common sets of
cancer types that will appear at all three scales (i.e., molecular, cellular
and population), all have to address significant data management and data
analysis problems, and all need to integrate simulation, data analysis and
machine learning at a large-scale to make progress. I will outline the
strategy for attacking these problems, the scale of the problems and how we
plan to utilize Exascale computing. A major goal of
this effort is to drive requirements for future systems beyond those needed
for traditional scientific computing applications. Of particular priority are
the requirement for large-scale data analysis and the application of deep
learning to all three problems. |
The potential to augment HPC systems with
Neuromorphic Computing Accelerators Rick
Stevens Argonne National Laboratory & Department of Computer Science, The University
of Chicago, USA As more
businesses, researchers and governments embrace machine learning as key
technology we see the emergence of hardware accelerators optimized for
execution of machine learning algorithms (e.g. SVM, ensemble and tree methods,
lasso, multi-kernel methods, naïve Bayes and deep neural networks). NVIDIA,
Intel and others have projects to address this focused marketplace and we are
starting to see products placed into vendor roadmaps. The primary focus of
these mainstream efforts appears to be accelerating the training phase of
deep neural networks since this use case is very compute intensive and is a
major bottleneck in machine learning productivity. In addition to CPU and GPU
optimizations accomplished by providing support for reduced precision, other
groups have designed and/or built hardware that departs from simple
functional unit optimization to include entire data path optimization for
machine learning and in some cases are ASICs targeting a specific method or
software stack such as Google’s TensorFlow engine.
The more extreme but still von-Neumann of these optimizations argue factors
of greater than 100 in power efficiency over conventional GPUs. While it is
certainly the case that progress can be made in adapting CPUs and GPUs to
address the needs of large-scale machine learning, there may be major
additional factors of 10 to be achieved in power reduction if we can
demonstrate the effective use of neuromorphic hardware designs as an
effective execution platform for large-scale machine learning applications.
To date there is a considerable gap between the scale and scope of state of
the art deep learning applications and those that have effectively been run
on neuromorphic hardware. I this talk I’ll review what we know about power
and performance efficiency gains from custom but conventional accelerators
and compare that to what might be achieved with neuromorphic hardware and
then discuss a path for integration into traditional HPC ecosystems and why
this might be a good idea. |
Merging Data Science and Large Scale
Computational Modeling Francis
Sullivan IDA/Center for
Computing Sciences, Bowie, MD, USA Future exascale computers will have to be suitable for both data
science applications and for more “traditional” modeling
and simulation. However, data science applications are often posed as
questions about discrete objects such as graphs while problems in modeling and simulation are usually stated initially in
terms of classical mathematical analysis. We will present examples and
arguments to show that the two points of view are not as distinct as one
might think. Recognizing the connections between the two problem sets will be
essential to development of algorithms capable of exascale
performance. We first touch briefly on methods such as particle swarm
optimization, which seem to be ad hoc and suited for discrete machine
learning applications but, when examined closely, can be seen to behave very
much like classical methods such as gradient descent. And classical problems,
such as inverting a Laplacian matrix, can be stated in terms of properties of
spanning trees and cycle space of a graph. Our main examples will be from
applications of Monte Carlo to attacking hard problems of the kind that occur
both in data science and in computational modeling
of physical phenomena. In the case of classical problems, methods for
determining convergence are often non-rigorous but capable of supplying a
physically meaningful answer. While in the discrete world, rigorous results
exist but establish complexity bounds that lead to methods that cannot be
used in practice. We will illustrate how taking ideas from both worlds pays
handsome dividends. |
From Clouds to Exascale:
Programming Issues in Big Data Analysis Domenico
Talia Department of Computer Engineering,
Electronics, and Systems University of Calabria, Italy Scalability
is a key feature for big data analysis and machine learning tools and
applications that need to analyze very large and
real-time data nowadays available from data repositories, social media,
sensor networks, smartphones and the Web. Scalable big data analysis today
can be achieved by parallel implementations that are able to exploit the
computing and storage facilities of HPC systems and clouds, whereas in the
next future exascale systems will be used to
implement extreme scale data analysis. In this talk we discuss how clouds
currently support the development of scalable data mining solutions and
outline the main challenges to be addressed and solved for implementing
future data analysis exascale systems. |
Kinetic Turbulence Simulations on Top
Supercomputers Worldwide William
Tang Princeton University, Dept. of Astrophysical
Sciences, Plasma Physics Section Fusion Simulation Program, Princeton Plasma
Physics Lab. & Princeton Institute for Computational Science
and Engineering, USA A major
challenge for high performance computing (HPC) today is to demonstrate how
advances in supercomputing technology translate to accelerated progress in
key application domains. This is the focus of an exciting program being
launched in the US -- the “National Strategic Computing Initiative (NSCI)” –
that was announced as an Executive Order on July 29, 2015, involving all
research & development (R&D) programs in the country to “enhance
strategic advantage in HPC for security, competitiveness, and discovery.” A
strong associated focus in key application domains is to accelerate progress
in advanced codes that model complex physical systems -- especially with
respect to reduction in “time-to-solution” as well as “energy to solution.”
If properly validated against experimental measurements/observational data
and verified with mathematical tests and computational benchmarks, these
codes can expected to improve much-needed predictive capability in many
strategically important areas of interest. As an
example application domain, computational advances in plasma physics and
magnetic fusion energy research have produced particle-in-cell (PIC)
simulations of turbulent kinetic dynamics for which computer run-time and
problem size scale very well with the number of processors on massively
parallel many-core supercomputers. For example, the GTC-Princeton (GTC-P)
code, which has been developed with a “co-design” focus, has demonstrated the
effective usage of the full power of current leadership class computational
platforms worldwide at the petascale and beyond to
produce efficient nonlinear PIC simulations that have advanced progress in
understanding the complex nature of plasma turbulence and confinement in
fusion systems for the largest problem sizes. Results have also provided
strong encouragement for being able to include increasingly realistic
dynamics in extreme-scale computing campaigns with the goal of enabling
predictive simulations characterized by unprecedented physics resolution/realism
for increasing problem size challenges. More generally, from a performance
modelling perspective, important “lessons learned” from these studies hold
significant promise for benefiting particle-in-cell (PIC) based software in
other application domains. |
Towards Support of Highly-Varied Workloads on
Supercomputers Adrian
Tate Cray EMEA Research Lab. United Kingdom The talk will describe current research projects
that broadly support the seamless execution of highly varied, data-intensive
workloads on Supercomputers. Systems of the near future will run jobs
including any mixture of compute-intensive, data-intensive processing, data
analytics, visualization and machine learning. As well as describing
fundamental software and hardware challenges, the talk will describe how more
integrated systems could be used to match pieces of varied workloads with
appropriate hardware. The need for advancements in some key software areas
will be described, including data- and memory-aware programming abstractions,
mathematical optimization support, new task schedulers and intelligent
runtimes. |
Globus Auth
Identity and Access Management Steve
Tuecke Computation Institute, The University of
Chicago, Chicago, IL, USA Globus Auth is a
foundational identity and access management (IAM) platform service, used for
brokering authentication and authorization interactions between end-users,
identity providers, applications (including web, mobile, desktop, and command
line), and services (including service to service). The goal of Globus Auth is to enable an extensible, integrated ecosystem of
applications and services for the research community. In this talk I will
introduce and demonstrate Globus Auth, and examine
how it can be used to enhance applications and services such as data portals
and science gateways with advanced IAM functionality. |
Eric Van
Hensbergen ARM Research, Austin, TX, USA In late
2011 ARM started exploring the use of its technologies in high performance
computing through the FP7 Montblanc project. Starting at that time with
32-bit mobile processors, the aim was to investigate how well power-efficient
mobile cores could run HPC workloads. In the five years since that project
started, combined with efforts from the US Department of Energy’s FastForward program and the European Horizon 2020
program, the ARM architecture has entered the enterprise server market with
64-bit processors and will be announcing architecture extensions to better
address compute-intensive workloads while still maintaining energy
efficiency. We now believe we have the necessary compute capabilities to
address Exascale and our focusing our research
efforts on addressing scalability and memory bottlenecks with a particular
emphasis on both increasing utilization of existing resources and reducing
unnecessary data movement. We believe this will allow ARM solutions to
achieve higher performance while maintaining energy efficiency and will also
serve to improve the ARM capabilities in both streaming analytics and other
forms of big-data. This talk will overview the past, present, and future of
ARM technologies in both the high-performance and data-intensive computing. |
How Well Do We Know Properties of Parallel
Algorithms? Vladimir
Voevodin Moscow State University, Research Computing Center, Moscow, RUSSIA The computing world is changing and all devices –
from mobile phones and personal computers to high-performance supercomputers
– are becoming parallel. At the same time, if the efficient usage of all the
opportunities offered by modern computing systems represents a global
challenge, it turns into a large number of challenges at extreme scale. Using
full potential of parallel computing systems and distributed computing
resources requires new knowledge, skills and abilities, where one of the main
roles belongs to understanding of key properties of parallel algorithms. What
are these properties? What should be discovered and expressed explicitly in
existing algorithms when a new parallel architecture appears? How to ensure
efficient implementation of an algorithm on an extreme scale computing
platform? All these as well as many other issues will be addressed in the
talk. The idea that we use in our practice is to split a
description of an algorithm into two parts. This helps us to explain what a
good parallel algorithm is and what is important for its efficient
implementation. The first part describes algorithms and their properties. The
second part is dedicated to describing particular aspects of their
implementation on various computing platforms. The first part draws attention
to the key theoretical properties, and the second part puts emphasis on the
aspects fundamentally important on practice. This division is made
intentionally to highlight the machine-independent properties of algorithms which
determine their potential and the quality of their implementations on
parallel computing systems, and to describe them separately from a number of
issues related to the subsequent stages of coding and execution. In addition
to the classical algorithm properties such as serial complexity, we have to
deal with concepts such as parallel complexity, parallel structure,
determinacy, data locality, performance and scalability estimates,
communication profiles for specific implementations, and many others aspects. This approach was successfully implemented as an
open encyclopedia AlgoWiki,
which is available for the computational community at
www.AlgoWiki-Project.org. |