HPC 2012


High Performance Computing, GRIDS and clouds


An International Advanced Workshop




June 25-29, 2012, Cetraro, Italy









Final Programme





International Programme Committee




Global HPC Programs Academia and Scientific Research Hewlett Packard



Math & Computer Science Div. Argonne National Laboratory, and Computation Institute of the University of Chicago



Innovative Computing Laboratory Computer Science Dept., University of Tennessee

and Oak Ridge National Laboratory



Extreme Scale Computing at SANDIA National Laboratories



Math & Computer Science Div., Argonne National Laboratory

and Dept of Computer Science, The University of Chicago



Community Grid Computing Laboratory, Indiana University



University of Delaware, Department of Electrical and Computer Engineering



HPC Consultant Regensburg, formerly SUN Microsystems and Duke University



Universitaet Muenster Institüt für Informatik



Dept. of Electronics, Informatics and Systems, University of Calabria



Technical University Clausthal



University of Gdansk



PDC Center for High Performance Computing Royal Institute of Technology



Institute for Advanced Simulation, Juelich Supercomputing Centre



Computer Sciences Dept., University of Wisconsin



Distributed Systems Architecture Group, Dpt. de Arquitectura de Computadores y Automática, Facultad de Informática, Universidad Complutense de Madrid



INFN – National Institute of Nuclear Physics – Italy, EU-IndiaGrid2



Department of Mathematical and Computing Sciences, Tokyo Institute of Technology



Argonne National Laboratory



Center for Grid Research and Development National Institute of Informatics



University of Amsterdam



Dept. of Electronics, Informatics and Systems, University of Calabria



Institute for Interdisciplinary Information Sciences Tsinghua University





Organizing Committee


LUCIO Grandinetti































Nvidia Corporation











Advance project



Amazon Web Services

Free Amazon web Service credits for all HPC 2012 delegates

Amazon is very pleased to be able to provide $200 in service credits to all HPC 2012 delegates. Amazon Web Services provides a collection of scalable high performance and data-intensive computing services, storage, connectivity and integration tools. From GPUs, to tightly coupled workloads on EC2; from 50k core scale out systems to map/reduce and Hadoop, utility computing is a good fit for a variety of HPC workloads.

For more information, visit our website: http://aws.amazon.com/hpc






The Chain Project



Convey Computer



Cray Inc.



E4 Computer Engineering



ENEA – Italian National Agency for New Technologies, Energy and the Environment (t.b.c.)












National Research Council of Italy

ICAR - Institute for High Performance Computing and Networks



Tsinghua University – Institute for Interdisciplinary Information Sciences

IIIS_Logo - Copy.jpg



University of Calabria – Department of Electronics, Informatics and Systems



2012 Media Sponsors




HPCwire is the #1 resource for news and information from the high performance computing industry. HPCwire continues to be the portal of choice for business and technology professionals from the academic, government, industrial and vendor communities who are interested in high performance and computationally-intensive computing, including systems, software, tools and applications, middleware, networking and storage.






HPC in the Cloud is the only portal dedicated to covering data-intensive cloud computing in science, industry and the data center. The publication provides technology decision-makers and stakeholders in the high performance computing industry (spanning government, industry, and academia) with the most accurate and current information on developments happening in the point where high performance and cloud computing intersect.





Datanami is a news portal dedicated to providing insight, analysis and up-to-the-minute information about emerging trends and solutions in big data. The portal sheds light on all cutting edge technologies including networking, storage and applications, and their effect upon business, industry, government, and research. The publication examines the avalanche of unprecedented amounts of data and the impact the high-end data explosion is having across the IT, enterprise, and commercial markets.








Workshop Agenda

Monday, June 25th





9:00 – 9:10

Welcome Address

Session I


State of the Art and Future Scenarios


9:15 – 9:45

J. Dongarra

On the Future of High Performance Computing: How to Think for Peta and Exascale Computing


9:45 – 10:15

I. Foster

Big Process for Big Data


10:15 – 10:45


Scientific Computing Supported by Clouds, Grids and Exascale Systems


10:45 – 11:15

K. Takeda

Cloud computing for research and innovation


11:15 – 11:45



11:45 – 12:15

A. Szalay

Extreme Data-Intensive Scientific Computing


12:15 – 12:45

S. Wallach

Big data? - So what!?


12:45 – 13:00


Session II


Emerging Computer Systems and Solutions


17:00 – 17:30

F. Baetke

Technology Trends in High Performance Computing


17:30 – 18:00

J.P. Panziera

Efficient Architecture for Exascale Applications


18:00 – 18:30

W. Gentzsch

Fujitsu and the HPC Pyramid


18:30 – 19:00



19:00 – 19:30


Supercomputing and Big Data: where are the real boundaries and opportunities for synergy


19:30 – 20:00

S. Wallach

Big Data Approaches At Convey


20:00 – 20:10




Tuesday, June 26th




Session III


Advances in HPC Technology and Systems I


9:00 – 9:25

S. Sherlekar

Virtual Appliances for HPC

A confluence of Technology, Architectures & Algorithms


9:25 – 9:50

W. Hu

The Chinese Godson Microprocessor for HPC


9:50 – 10:15


Micro-virtualization for HPC


10:15 – 10:40


From Multi-Processor System-on-Chip to High Performance Computing


10:40 – 11:05

e. d’hollander

Programming and Performance of a combined GPU/FPGA Super Desktop


11:05 – 11:35



11:35 – 12:00

M. Fatica

Efficient utilization of computational resources in hybrid clusters


12:00 – 12:25

J. Kowalik

Is heterogeneous computing a next mainstream technology in HPC?


12:25 – 12:50

T. PuzniakoWski

Performance of OpenCL


12:50 – 13:00


Session IV


Advances in HPC Technology and Systems II


17:00 – 17:30

S. Gorlatch

A Uniform High-Level Approach to Programming Systems with Many Cores and Multiple GPUs


17:30 – 18.00


A Codelet Based Execution Model and Its Memory Semantics


18:00 – 18:30


Environments for Collaborative Applications on e-Infrastructures


18:30 - 19:00



19:00 -19:30

A. Yonezawa

Applications on K computer and Advanced Institute of Computational Science


19:30 - 20:00

K. Miura

Open Petascale Libraries (OPL) Project


20:00 – 20:10




Wednesday, June 27th




Session V


Software and Architecture for Extreme Scale Computing I


9:00 – 9:30

M. Seager

Future Exascale systems, so what’s different?


9:30 – 10:00


Software Implications of New Exascale Technologies


10:00 – 10:30

T. Sterling

Achieving Scalability in the Presence of Asynchrony


10:30 – 11:00

B. Lucas

Adiabatic Quantum Computing


11:00 – 11:30



11:30 – 12:00

S. Dosanjh

Exascale Design Space Exploration


12:00 – 12:30

T. Lippert

The EU Exascale Project DEEP - Towards a Dynamical Exascale Entry Platform


12:30 – 13:00


Hybrid system architecture and application


13:00 – 13:10


Session VI


Software and Architecture for Extreme Scale Computing II


16:30 – 17:00


Extreme Scale Computational Science Challenges in Fusion Energy Research


17:00 – 17:30

N. Bates

Achieving the 20MW Target: Energy Efficiency for Exascale


17:30 – 18:00



18:00 – 20:00


PANEL DISCUSSION: Five years into exascale exploration: what have we learned?

Chairman: P. Messina

Participants: F. Baetke, N. Bates, W. Blake, S. Dosanjh,

T. Lippert, Y. Lu, B. Lucas, K. Miura, R. Nair, M. Seager,

T. Sterling, W. Tang, S. Wallach




Thursday, June 28th




Session VII


Cloud Computing Technology and Systems I


9:00 – 9:25

V. Getov

Cloud Adoption Issues: Interoperability and Security


9:25 – 9:50

R. Martin

Qos-Aware Management of Cloud Applications


9:50– 10:15

J. Vazquez-Poletti

Automatic IaaS Elasticity for the PaaS Cloud of the Future


10:15 – 10:40

O. Kao

Stratosphere - data management on the cloud


10:40 – 11:05

D. Talia

A Cloud Framework for Knowledge Discovery Workflows on Azure


11:05 – 11:35



11:35 – 12:00

G. Fox

FutureGrid exploring Next Generation Research and Education


12:00 – 12:25

P. Kacsuk

Executing Multi-workflow simulations on a mixed grid/cloud infrastructure using the SHIWA Technology


12:25 – 12:50

D. Petcu

Open-source platform-as-a-service: requirements and implementation challenges


12:50 – 13:00


Session VIII


Cloud Computing Technology and Systems II


15:45 – 16:10

Y. Tanaka

 Building Secure and Transparent Inter-Cloud Infrastructure for Scientific Applications


16:10 – 16:35

J. Qiu

Scientific Data Analysis on Cloud and HPC Platforms


16:35 – 17:00

A. Goldman

The suitability of BSP/CGM model for HPC on Clouds


17:00 – 17:30


Session IX


BIG DATA and Data-Intensive Computing


17:30 – 17:55

V. Pascucci

Big Data Analytics for Science Discovery


17:55 – 18:20

W. Gentzsch

EUDAT - European scientists and data centers turn to big data collaboration


18:20 – 18:45

C. Catlett

Smart Cities and Opportunities for Convergence of Open Data and Computational Modeling


18:45 – 19:10

A. Choudhary

Discovering Knowledge from Massive Social Networks and Science Data - ­Next Frontier for HPC


19:15 – 20:15


PANEL DISCUSSION: Cloud Computing and Big Data: Challenges and Opportunities

Chairmen: C. Catlett and V. Getov

Participants: A. Choudhary, P. Martin, V. Pascucci, D. Talia




Friday, June 29th




Session X


Challenging Applications of HPC, Grids and Clouds


9:00 – 9:25

G. Tallant

High Performance Computing Challenges from an Aerospace Perspective


9:25 – 9:50

T. David

Macro-scale phenomena of arterial coupled cells: a Massively Parallel simulation


9:50 – 10:15

R. Dror

Overcoming Communication Latency Barriers in Massively Parallel Molecular Dynamics Simulation on Anton


10:15 – 10:40

C. Garcia Garino

Job scheduling of parametric computational mechanics studies on cloud computing infrastructure


10:40 – 11:05


Multi-Resolution Streams of Big Scientific Data: Scaling Visualization Tools from Handheld Devices to In-Situ Processing


11:05 – 11:35


Session XI


Advanced Infrastructures and Projects of HPC, Grids and Clouds


11:35 – 12:00

B. Di Martino

Portability and Interoperability in Clouds: Agents, Semantic and Volunteer computing can help - the mOSAIC and Cloud@Home projects


12:00 – 12:25

A. Wang

Smart Sensing for Discovering and Reducing Energy Wastes in Office Buildings


12:25 – 12:50


Project ADVANCE: Ant Colony Optimisation (ACO) using coordination programming based on S-Net


12:50 – 13:00







Paul Messina

Argonne National Laboratory

Argonne, IL





Gerhard Joubert

Technical University Clausthal





Jack Dongarra

Innovative Computing Laboratory

University of Tennessee


Oak Ridge National Laboratory

Knoxville, TN





Ian Foster

Argonne National Laboratory


Department of Computer Science

The University of Chicago

Argonne & Chicago, IL





Bill Blake

Cray Inc.

Seattle, WA





Bill Blake

Cray Inc.

Seattle, WA





Wolfgang Gentzsch

HPC Consultant




SUN Microsystems and

Duke University, North Carolina





Wolfgang Gentzsch

HPC Consultant




SUN Microsystems and

Duke University, North Carolina





Bob Lucas

Computational Sciences Division

Univ. of Southern California

Information Sciences Institute

Los Angeles, CA





Patrick Martin

School of Computing

Queen’s University

Kingston, Ontario





Patrick Martin

School of Computing

Queen’s University

Kingston, Ontario






Five years into exascale exploration: what have we learned?


It has already been five years since the first three workshops on exascale computing were organized. Literally dozens of additional workshops on various aspects of exascale computing have been held, Research&Development efforts have been launched by various countries, computer manufacturers have worked on roadmaps that would lead to affordable exascale systems, and computational scientists have identified myriad exciting advances that such systems would enable. What lessons have we learned from these activities that might help guide the considerable additional R&D that is needed on component technologies, system architecture integration, programming models, system and application software? The panelists will voice their opinions about the lessons learned and debate about the most fruitful future directions.


Chairman: P. Messina


Panelists: F. Baetke, N. Bates, W. Blake, S. Dosanjh,

T. Lippert, Y. Lu, B. Lucas, K. Miura, R. Nair, M. Seager,

T. Sterling, W. Tang, S. Wallach


Back to Session VI


Cloud Computing and Big Data: Challenges and Opportunities


Cloud computing represents a fundamental shift in the delivery of information technology services and has been changing the computing landscape over the last several years. Concurrently, an increasing number of application areas are grappling with challenges related to the scale and/or complexity of data - collectively called "big data" challenges. In both areas we see commercial successes as well as continuing research challenges.

What are the overlaps between cloud computing, particularly at global scale, and big data? Is there room for working towards joint solutions? What classes of "big data" problems can be addressed via a cloud approach, and are there classes of data that are less effectively handled in a cloud environment? In this panel session, each of the panelists will present their position statements covering certain important aspects of this subject followed by a discussion of the future directions for research and development.


Chairmen: C. Catlett and V. Getov


Participants: A. Choudhary, P. Martin, V. Pascucci, D. Talia


Back to Session IX




On the Future of High Performance Computing: How to Think for Peta and Exascale Computing


Jack Dongarra

University of Tennessee

Oak Ridge National Laboratory


In this talk we examine how high performance computing has changed over the last 10-year and look toward the future in terms of trends. These changes have had and will continue to have a major impact on our software.  Some of the software and algorithm challenges have already been encountered, such as management of communication and memory hierarchies through a combination of compile--time and run--time techniques, but the increased scale of computation, depth of memory hierarchies, range of latencies, and increased run--time environment variability will make these problems much harder.

We will look at five areas of research that will have an importance impact in the development of software and algorithms.

We will focus on following themes:
• Redesign of software to fit multicore and hybrid architectures
• Automatically tuned application software
• Exploiting mixed precision for performance
• The importance of fault tolerance
• Communication avoiding algorithms


Back to Session I

Big Process for Big Data


Ian Foster

Computation Institute

Argonne National Laboratory & University of Chicago, USA


We have made much progress over the past decade toward effectively harnessing the collective power of IT resources distributed across the globe. In fields such as high-energy physics, astronomy, and climate, thousands benefit daily from tools that manage and analyze large quantities of data produced and consumed by large collaborative teams.

But we now face a far greater challenge: Exploding data volumes and powerful simulation tools mean that far more—ultimately most?--researchers will soon require capabilities not so different from those used by these big-science teams. How is the general population of researchers and institutions to meet these needs? Must every lab be filled with computers loaded with sophisticated software, and every researcher become an information technology (IT) specialist? Can we possibly afford to equip our labs in this way, and where would we find the experts to operate them?

Consumers and businesses face similar challenges, and industry has responded by moving IT out of homes and offices to so-called cloud providers (e.g., Google, Netflix, Amazon, Salesforce), slashing costs and complexity. I suggest that by similarly moving research IT out of the lab, we can realize comparable economies of scale and reductions in complexity.

More importantly, we can free researchers from the burden of managing IT, giving them back their time to focus on research and empowering them to go beyond the scope of what was previously possible.

I describe work we are doing at the Computation Institute to realize this approach, focusing initially on research data lifecycle management. I present promising results obtained to date with the Globus Online system, and suggest a path towards large-scale delivery of these capabilities.


Back to Session I

Scientific Computing Supported by Clouds, Grids and Exascale Systems


Geoffrey Fox

Community Grid Computing Laboratory

Indiana University

Bloomington, IN, USA


We analyze scientific computing into classes of applications and their suitability for different architectures covering both compute and data analysis cases and both high end and long tail users. We propose an architecture for next generation Cyberinfrastructure and outline some of the research challenges.


Back to Session I

Cloud computing for research and innovation


Kenji Takeda

Microsoft Research Connections EMEA

Cambridge, UK


Cloud computing is challenging the way we think about parallel and distributed computing, particularly in the context of HPC andthe Grid. It opens up many possibilities for how research, development and businesses can exploit compute, storage and services on-demand to exploit new opportunities across the whole spectrum of applications and domains. In this talk we discuss how the community has been exploring the use of Cloud Computing, including through the European Union Framework Programme 7 VENUS-C project, and the global Azure Research Engagement programme. We conclude with thoughts on how cloud computing is potentially reshaping the landscape of research and innovation.


Back to Session I

Extreme Data-Intensive Scientific Computing


A. Szalay

Department of Physics and Department of Computer Science

John Hopkins University


Scientific computing is increasingly revolving around massive amounts of data. From physical sciences to numerical simulations to high throughput genomics and homeland security, we are soon dealing with Petabytes deployed various scientific test cases, mostly drawn from astronomy, over different architectures and compare performance and scaling laws. We discuss a hypothetical cheap, yet high performance multi-petabyte system currently under consideration at JHU.

We will also explore strategies of interacting with very if not Exabytes of data. This new, data-centric computing requires a new look at computing architectures and strategies. We will revisit Amdahl's Law establishing the relation between CPU and I/O in a balanced computer system, and use this to analyze current computing architectures and workloads.

We will discuss how existing hardware can be used to build systems that are much closer to an ideal Amdahl machine. We have large amounts of data, and compare various large scale data analysis platforms.


Back to Session I

Big data? - So what!?


Steve Wallach

Convey Computer Corporation

Richardson, TX, USA


Big Data has been processed for decades. Classically the database size was constant or  gradually increasing. With the advent of searching, directed advertisement, social networking, worldwide electronic messaging and web-based applications, the database increases in real-time. This coupled with the availability of petabytes of storage, naturally leads to the need for new types of power aware computer architectures and knowledge discovery algorithms.


This talk will focus on new types of algorithms, and architectures that are dynamically chosen based on the data type and data base size.


Back to Session I

Technology Trends in High Performance Computing


Frank Baetke

Global HPC Programs

Academia and Scientific Research

Hewlett Packard

Palo Alto, CA, USA


HP’s HPC product portfolio which has always been based on standards at the processor, node and interconnect level lead to a successful penetration of the High Performance Computing market across all application segments. The rich portfolio of the Proliant BL-series and the well-established rack-based Proliant DL family of nodes has been complemented bythe SL-series with proven Petascale scalability and leading energy efficiency.Very recently this portfolio has been extended by a new family of severs announced under the name “Moonshot”.


Power and cooling efficiency is primarily an issue of cost, but also extends for the power and thermal density of what can be managed in a data center.  To leverage the economics of scale established HPC centers as well as providers of HPC Cloud services are evaluating new concepts which have the potential to make classical data center designs obsolete. Those new concepts provide significant advantages in terms of energy efficiency, deployment flexibility and manageability. Examples of this new approach, often dubbed POD for Performance Optimized Datacenter, including a concept to scale to multiple PFLOPS at highest energy efficiency will be shown.


Finally an outlook will be given towards systems families due end of the decade that will provide performance in excess of a 1000 Petaflops or 1 Exaflop.


Back to Session II

Efficient Architecture for Exascale Applications


Jean-Pierre Panziera

Extreme Computing Division Bull, France


Now that more Petaflop systems are becoming available, the HPC industry is turning to the next challenge: Exascale.
To achieve the Exascale goal before the end of the decade, the key issues to solve are scalability, resilience, energy optimisation and most importantly application efficiency. Application efficiency, to be measured with parameters like "model size" or "runs per day and per Watt" rather than peak Flops. The future Exascale systems will be based on power efficient processing units which will feature many computational cores (1000s?). But the Exascale applications scaling to millions of cores have yet to be developed.
In the meantime, an intermediate generation of systems available in 2014-15 will preview the architecture of future Exascale systems and represent the platform for Exascale applications development and full scale testing.


Back to Session II

Fujitsu and the HPC Pyramid


Wolfgang Gentzsch

Executive HPC Consultant

Fujitsu (external)


With the K Computer installed at the RIKEN Advanced Institute for Computational Science in Kobe, Japan, Fujitsu has re-joined the peak of the high performance computing pyramid. At the SC’11 HPC Challenge Awards, K received top-rankings in all four performance categories i.e. Linpack, RandomAccess, STREAM, and FFT.

While K and its commercial PRIMEHPC FX-10 joined the top systems, the x86 based PRIMERGY systems are completing the pyramid’s mid and bottom layers for mainstream HPC.

All Fujitsu HPC systems are bundled into user-friendly ready-to-go solutions consisting of HPC hardware, middleware, HPC portal, and services, providing ease-of-use HPC for the different application segments. In addition, collaborations enabled by the SynfiniWay integrated software framework for virtualized distributed HPC.

This presentation aims at providing an overview of Fujitsu’s HPC solution portfolio, from top-end supercomputing, to mid-market HPC, and technical cloud computing. We will demonstrate how Fujitsu as the world's third-largest IT services provider drives innovation in high performance computing for industry and research.


Back to Session II

Supercomputing and Big Data: where are the real boundaries and opportunities for synergy


Bill Blake

CTO and SVP, Cray, Inc., USA


Supercomputing provides an increasing high fidelity view of the world through numerically intensive modeling and simulation techniques that support complex decision making and discoveries in the scientific and technical fields. Big Data Analytics, as it is called today, also provides an accurate view of the world through data intensive search, aggregation, sorting and grouping techniques that support complex decision making and knowledge discovery in the web and business transaction fields. The talk will explore the architectures, data models and programming models of Supercomputing and Big Data and in particular the implication to Cray's Adaptive Supercomputing Vision.


Back to Session II

Big Data Approaches At Convey


Steve Wallach

Convey Computer Corporation

Richardson, TX, USA


An overview of the architectural aspects, both hardware and software, of the convey’s thrust into data intensive computing.


Back to Session II

Virtual Appliances for HPC

A confluence of Technology, Architectures & Algorithms


Sunil Sherlekar

Parallel Computing Research


Bangalore, India


For an engineer engaged in the design of (say) an aircraft, the ideal design tool is a computational Appliance — one that would be optimised for his/her computational needs in terms of performance, cost of capital (hardware), cost of operation (power consumption) and user interface. For aircraft design, these computational needs would typically involve computing the lift that the wings would generate and the atmosphericdrag that the aircraft would experience. A similar scenario can be painted for any designer who uses simulation and optimisation in his/her design flow.


A custom-built appliance — right down to the compute engines in silicon — is, however, an expensive proposition, both in terms of design and fabrication. This becomes more so as we progress further into nanometre semiconductor fabrication technologies. Over the years, therefore, using general-purpose compute engines or processors has become a commonplace.


For the last three decades or so, processors have shown a steady improvement in performance. Most of this is an outcome of Moore’s “Law” that envisages increasing circuit densities resulting in increasing clock frequency (and hence higher rates of execution of instructions) for processors. Further increase in performance has come about from architectural innovations: pipelining, branch prediction and out-of-order execution to speed up sequential programs; vector instructions for fine-grained parallelism and hyper-threading for coarse-grained parallelism.


A virtuous cycle has been established between the semiconductor industry and application developers: while application developers eagerly use up increasing processor performance, they also set the expectation of higher performance from future processors.


Over the last few years, however, this fairy-tale-like increase in clock frequency has hit a wall. This is because increasing clock frequency means increasing power consumption. Besides the economic downside of higher operating costs for HPC, this has now created the additional problem of dissipating the resulting heat.


The only way to tackle the problem of heat dissipation is to produce less heat! The only way to produce less heat is to operate the processors at a lower clock frequency and lower operating voltage. If this is done, it also means — unfortunately — that each processor also has a lower performance! A lower performance at the system level is, of course, not acceptable.


The semiconductor industry has tackled this dilemma by providing increasing performance through a technique that the HPC community has always used: increasing parallelism! The increasing proliferation of multi-core and many-core chips is a result of this strategy.

The multi-core chips from Intel’s Xeon family provide for fine-grained parallelism through vector instructions or SIMD and coarse-gained parallelism through several cores on the same chip. This idea is taken further in Intel’s Knights or MIC (Many-Integrated Core) family. MIC provides for even greater parallelism through a larger SIMD width and a much larger number of cores on one chip. KNC, the first in this family to be made commercially available, provides a 1 TF performance on DGEMM as announced during SC’11.


In the future, as we go into smaller fabrication process geometries, increasing performance will be provided through increasing parallelism on a chip while attempting to keep the power dissipation per chip constant. This will require addressing several issues:

§         Reducing operating voltage while avoiding bit errors or minimising them and handling them at “higher levels” through error correction.

§         Reducing bus power by using techniques such as current-mode signalling.

§         Developing circuit design techniques to handle variations in transistor characteristics with a minimum impact on performance.

§         Avoiding clocking and using “transition signalling” where possible.


The other serious “wall” the semiconductor industry faces today is that of moving data. This problem has two facets. One, while the speed of moving data is increasing, it is not keeping pace with the increasing speed of computation. This is true both for moving data to and from memory into processors and for moving data between compute nodes in a system. This means the overall speed of computation is increasingly being limited by the bandwidth of memory and of interconnect networks. Secondly, the reduction in power consumed per unit of computation is happening faster than the reduction in power consumed to move data. This means it is getting increasingly cheaper, in terms of power consumption, to perform computation on data than to move it around!


The technologies being pursued by the semiconductor industry to tackle the data movement wall include:

§         Bringing the memory closer to the processors and increasing the data bus width by using the chip area and not just the perimeter (3D chip stacking with TSV’s or Through-Silicon Vias).

§         Increasing the data rate by using optical signals. While Intel’s silicon photonics technologies help achieve this, electro-optical conversion at a miniaturised level is still a challenge.

§         Better interconnect topologies.

§         Obviating the constraints of topologies by using free-space communication using steered laser beams: still up in the air!


Even if all of the above technologies were to bear fruition, the problem will only be alleviated; it is quite unlikely that it will actually go away. The key, therefore, it to develop “communication avoiding” algorithms — those that reduce data movement even at the cost of increased computation. This can be done at several levels of abstraction.


Going forward, we at Intel are committed to expand our design strategy to encompass a top-down approach. This means designing architectures that explicitly take the requirements of application developers into account. In the near to mid-term, the following are some of the ideas that may deserve consideration:

§         Should the high-speed memory that can be created using 3D chip stacking be a program-addressable memory or a (last-level) cache?

§         Should we have cache memory at all or should all memory be program-addressable? What are the implications for power consumption?

§         For a program-addressable memory hierarchy — when the data traffic is program generated and not for cache coherence — what on-chip interconnect architectures would be most suitable?

§         If all memory is program-addressable, can compiler technology alleviate the programmer’s burden to manage date transfer between various levels of memory?

§         If, say for legacy reasons, it is necessary to have a cache hierarchy, would it help if the cache replacement policy were to take care of the data access patterns of a given application? Can data access patterns be characterised for this purpose? Would it still help to allow the programmer to define his/her own cache replacement policy?

§         With a large — and perhaps increasing — SIMD width such as that on Intel’s MIC processors, would it help if, instead of SIMD, we could carry out more than one operation on different parts of the SIMD register? In particular, is VLIW better than SIMD? Should the architecture allow a programmer-controlled, application-driven trade-off?

§         Are hardware blocks specific to application domains a good idea?


The point about all the above ideas — and many others — is not that they are particularly radical. It is that evaluating their impact in terms of various applications needs a huge investment in design time and prototyping costs. If this analysis can be carried out without the need of prototyping, it would be a great boon. As a first step, it would help create a formal description of hardware that is more abstract than RTL so as to be tractable but less abstract than ISA so as to be useful.


As a company, Intel’s commitment to application-driven architecture design is enabled by the fact that we can optimise all aspects of the design and fabrication process. In the final analysis, the biggest problems that need to be solved in the long-term are those that involve fabrication. We are also committed to ensure backward compatibility (to support all “legacy applications) and to support a continuity of programming paradigms (to minimise programming effort).


This brings us back to the issue of providing Design Appliances which are tailored to specific application domains. Especially with the increasing cost of foundries that cater to nanometre-scale geometries, it seems impractical to use hardware that is application specific. We can arrive at a solution, however, by looking at the exact requirements of an HPC appliance:

§         Efficient computing that can solve HPC problems in a reasonable amount of time at a reasonable cost.

§         A user-interface that is tailored to the application domain and “talks” the language of the domain (instead of the language of computer science or electronics).

§         A service that is provided on-demand and independent of the location of the user.

A possible way of providing such appliances would be to use the “Cloud” model. This would entail:

§         Setting up several petascale HPC systems based on standard, general-purpose processors and a generous repertoire of application software.

§         Connecting these systems to one another and to all the users through a high-speed network.

§         Implementing application-specific, user-interface software on end-user devices for visualisation and interaction with the application software on the HPC systems.


Besides the continuing improvements in computing technologies, creating such appliances will need:

a)      Developing highly reliable, truly high-bandwidth wireless communication technologies. This is needed to support the transfer of huge amounts of data that some application generate to end-user devices on the go and

b)      Flexible display panels that can be rolled up or folded to be easily carried and temporarily pinned or stuck on walls for use. This is to support high-quality visualisation of simulation results on the go.

If this is done, we would have created, for each application domain, a Virtual Appliance — something that combines the customised experience of a real appliance with the economy of a general-purpose shared system.


Back to Session III

The Chinese Godson Microprocessor for HPC


Weiwu HU

Institute of Computing Technology

Chinese Academy of Sciences


The presentation will briefly introduce the Godson CPU roadmap for high performance computers (HPC). Servers and HPCs use the same CPU before the year of 2012. Under the background of building 100PFLOPS HPC in the year of 2015, the CPU for HPC should reach TeraFLOPS performance.

Different CPUs will be designed for servers and HPCs. Server CPU will take the traditional multi-core architecture, while HPC CPU will take many-core or long vector architecture.

Bandwidth limitation and power consumption limitation will be the big challenge for HPC CPU design.

Back to Session III

Micro-virtualization for HPC


Dale Geldart

eXludus Technologies, Inc.

Corporate Headquarters

Montréal, Québec, CANADA


As core counts continue to rise, the need to safely and reliably run more concurrent tasks on each system also increases if we are to maximize processor and energy efficiency. Concurrently running more tasks, however, can lead to increased shared resource conflicts that can degrade efficiency, especially as in many cases memory per core is decreasing, which puts more pressure on memory resources. New lightweight micro-virtualization strategies can help users improve system efficiency while avoiding these shared resource conflicts.


Back to Session III

From Multi-Processor System-on-Chip to High Performance Computing


Marcello Coppola

STMicroelectronics, Advanced System TechnologyGrenoble Lab

Grenoble, France


Current high-end multicore architectures when designed for maximum speed waste available transistors, computation time, memory bandwidth , pipeline flow (optimized for sequential operation) resulting in a power efficiency that is one or two orders of magnitude away from what HPC demands.  Today, architecture designed for mobile and embedded systems, employing energy-efficient components, represent a valid alternative is to standard multicore architecture.  .In this presentation, first some example of MPSoC architectures used in high end consumer markets is presented. Next, we introduce how technology and innovative heterogeneous architecture could be used to implement modern HPC. Finally we conclude the presentation showing the power of MPSoC architectures in delivering substantial performance improvements in high-performance computing applications.


Back to Session III

Programming and Performance of a combined GPU/FPGA Super Desktop


Erik D’Hollander

Ghent University, Belgium


The high-performance of GPUs have made personal supercomputing a reality in many applications exhibiting single program multiple data parallelism. Programs with less obvious parallelism may be accelerated by field programmable gate arrays or FPGAs, which complement the computing power by a very flexible and massively parallel architecture.

Field programmable gate arrays provide a programmable architecture which allows to embed an algorithm into hardware and drive it with data streams. A multicore CPU accelerated by GPUs and FPGAs is a hybrid heterogeneous system with a huge computational power and a large application area.

We present a super desktop computer consisting of a GPU and two FPGAs and describe the interconnections, the tool chain and the programming environment.

The performance of GPUs and FPGAs as accelerators of desktops and supercomputers is restricted by the traffic lanes between the processor and the accelerator. The roofline model by Williams et al. is able to represent both the raw computing performance and the input-output bottleneck in a single graph. Whereas the roofline is completely determined by the characteristics of processors with a fixed architecture, this is not the case for reconfigurable processing elements such as FPGAs. On the contrary, in this case the roofline model may be used to optimize the resource utilization and the input-output channels as to obtain the maximum performance for a particular application. The design and quality of different hardware implementations of the same algorithm is enhanced by the strength of modern high-level synthesis tools such as AutoESL and ROCCC, which facilitate the development of powerful reconfigurable systems. We present the results of a number of image processing algorithms where the roofline model was used to obtain the maximum performance with a balanced resource usage and maximum input-output yield. It is shown that the modern high level tools vary significantly with respect to development time and performance of the resulting computational architecture.

Back to Session III

Efficient utilization of computational resources in hybrid clusters


Massimiliano Fatica

NVIDIA Corporation

Santa Clara, CA, USA


Efficient utilization of computational resources in hybrid clusters.


Hybrid clusters composed by node accelerated with Graphics Processor Units (GPUs) are moving quickly from the experimental stage into production systems.

This talk will present two examples in which the computational workload is split between CPU cores and GPUs in order to fully utilize the computational capabilities of hybrid clusters.

The first example will describe a library that accelerates matrix multiplications, currently used in the CUDA accelerated HPL code and in quantum chemistry codes.

The second example is from TeraTF, a CFD code part of the SPEC-MPI suite.

In both cases close to optimal performances could be achieved taking particular care of the data movement and by using a combination of MPI, OpenMP and CUDA.

Back to Session III

Is heterogeneous computing a next mainstream technology in HPC?


Janusz Kowalik

University of Gdansk

Gdansk, Poland


Heterogeneous computing is regarded as a technology on the path to the exascale computation. However current architectural and programming trends point to significant changes that may replace the notion of the heterogeneous computing.

by the classic idea of SMP with massive parallelism. Hence the answer to the title question is a good topic for a workshop discussion.

Back to Session III

Performance of OpenCL


Tadeusz Puźniakowski

University of Gdansk

Gdansk, Poland


The OpenCL standard is a relatively new standard that allows for computation on heterogeneous architectures. The first part of the presentation summarizes basic rules and abstractions used in OpenCL. The main part will contain the experimental results related to a linear algebra algorithm implemented with different methods of optimization and run on different hardware as well as the same algorithm run using OpenMP.


Back to Session III

A Uniform High-Level Approach to Programming Systems with Many Cores and Multiple GPUs


Sergei Gorlatch

Universitaet Münster

Institut für Informatik

Münster, Germany


Application programming for modern heterogeneous systems which comprise multiple multi-core CPUs and GPUs is complex and error-prone.

Approaches like OpenCL and CUDA are low-level and offer neither support for multiple GPUs within a stand-alone computer nor for systems that integrate several computers. Distributed systems require programmers to use a mix of different programming models, e.g., MPI together with Pthreads, OpenCL or CUDA.


We propose a uniform approach based on the OpenCL standard for programming both stand-alone and distributed systems with GPUs.

The approach is based on two parts:

1) the SkelCL library for high-level application programming on stand-alone computers with multi-core CPUs and multiple GPUs, and

2) the dOpenCL middleware for transparent execution of OpenCL programs on several stand-alone computers connected over a network.

Both parts are built on top of the OpenCL standard which ensures their high portability across different kinds of processors and GPUs.


The SkelCL library offers a set of pre-implemented patterns (skeletons) of parallel computation and communication which greatly simplify programming for multi-GPU systems. The library also provides an abstract vector data type and a high-level data (re)distribution mechanism to shield the programmer from the low-level data transfers between a system's main memory and multiple GPUs.The dOpenCL middleware extends OpenCL, such that arbitrary computing devices (multi-core CPUs and GPUs) in a distributed system can be used within a single application, with data and program code moved to these devices transparently.

In this talk, we describe SkelCL and dOpenCL and illustrate how they are used together to simplify programming of heterogeneous HPC systems with many cores and multiple GPUs.

Back to Session IV

Environments for Collaborative Applications on e-Infrastructures


Marian Bubak

Department of Computer Science and ACC Cyfronet, AGH Krakow, Poland

Institute for Informatics, University of Amsterdam, Netherlands


Development and execution of e-science applications is a very demanding task. They are collaborative, used in dynamics scenarios (similar to experiments) and there is a need to link them with publications [8]. Most of them are used to solve problems which are multi-physics and multi-scale what results in various levels of coupling of applications components. Besides of being compute intensive, more and more often they are data also intensive.


This talk presents and evaluates a few approaches to development and execution of such e-science applications on currently available e-infrastructures like grids and clouds [1]. Resources of these infrastructure are shared between different organisations and may change dynamically, so there is a need for methods and tools to master them in an efficient way [2].


We present the WS-VLAM workflow system which aims at covering the entire life cycle of scientific workflows: end-users are able to share workflows, reuse each other workflow components, and execute workflow on resources across multiple organizations [3].

GridSpace [4] is a novel virtual laboratory framework enabling to conduct virtual experiments on grid-based infrastructures. It facilitates exploratory development of experiments by means of scripts which can be expressed in a number of popular languages, including Ruby, Python and Perl. One of most demanding applications are those from the area of Virtual Physiological Human. Cloud Data and Compute Platform enables efficient development and execution of such applications by providing methods and tools to install services on available resources, execute workflows and standalone applications, and to manage data in a hybrid cloud-grid infrastructure [5].

Common Information Space is a service-based framework for processing of sensor data streams and to run early warning systems applications and manage their results. Although originally it was elaborated for building and running flood early warning systems, it may be applicable as an environment for any e-science applications [6].

On top of the GridSpace we have elaborated an environment for composing multi-scale applications [7] built from single scale models implemented as scientific software components, distributed in various e-infrastructures.

Applications structure is described with the Multiscale Modelling Language (MML). The environment consists of a semantic-aware persistence store to record metadata about models and scales, a visual composition tool transforming high level MML description into executable GridSpace experiment, and finally, the GridSpace supports execution and result management of generated experiments.


The talk will be concluded with an analysis and evaluation of these different approaches to construction of environments supporting collaborative e-science applications.




[1] M. Bubak, T. Szepieniec, K. Wiatr (Eds.): Building a National Distributed

e-Infrastructure - Pl-Grid. Scientific and Technical Achievements. Springer, LNCS 7136, 2012.


[2] J.T. Moscicki; M. Lamanna; M.T. Bubak and P.M.A. Sloot: Processing moldable tasks on the grid: Late job binding with lightweight user-level overlay, Future Generation Computer Systems, vol. 27, nr 6 pp. 725-736. June 2011. ISSN 0167-739X. (DOI: 10.1016/j.future.2011.02.002)


[3] Adam Belloum, Márcia A. Inda, Dmitry Vasunin, Vladimir Korkhov, Zhiming Zhao, Han Rauwerda, Timo M. Breit, Marian Bubak, Louis O. Hertzberger: Collaborative e-Science Experiments and Scientific Workflows. IEEE Internet Computing (INTERNET) 15(4):39-47 (2011)


[4] E. Ciepiela, D. Harezlak, J. Kocot, T. Bartynski, M. Kasztelnik, P. Nowakowski, T. Gubała, M. Malawski, M. Bubak: Exploratory Programming in the Virtual Laboratory. In: Proceedings of the International Multiconference on Computer Science and Information Technology, pp. 621-628 (October 2010), [5] VPH-Share Cloud Platform: http://dice.cyfronet.pl/projects/details/VPH-Share


[6] Bartosz Balis, Marek Kasztelnik, Marian Bubak, Tomasz Bartynski, Tomasz Gubala, Piotr Nowakowski, Jeroen Broekhuijsen: The UrbanFlood Common Information Space for Early Warning Systems. Procedia CS 4: 96-105 (2011)


[7] Katarzyna Rycerz and Marian Bubak: Building and Running Collaborative Distributed Multiscale Applications, in: W. Dubitzky, K. Kurowsky, B. Schott (Eds), Chapter 6, Large Scale Computing, J. Wiley and Sons, 2012


[8] Marian Bubak, Piotr Nowakowski, Tomasz Gubala, Eryk Ciepiela: QUILT – Interactive Publications, FET11 – The European Future Technologies Conference and Exhibition, Budapest, May 4-6, 2011


Back to Session IV

Applications on K computer and Advanced Institute of Computational Science


Akinori Yonezawa

Advanced Institute of Computational Science (AICS)

Kobe, Hyogo, Japan


Some notable applications running on the K supercomputer will be presented, which include Tsunami simulations and mitigation of their damage as well as simulation of a whole human heart.

Also the talk describes Riken Advanced Institute of Computational Science

(AICS) which is the research organization for K computer and the next generation HPC.


Back to Session IV

Open Petascale Libraries (OPL) Project


Dr. Kenichi Miura

National Institute of Informatics, Tokyo, Japan


Fujitsu Laboratories Limited, Kawasaki, Japan


With the advent of the petascale supercomputing systems, we need to rethink the programming model and numerical libraries. For one thing, we need to make efficient use of multi-core CPUs. For example, the K Computer at RIKEN contains over 700 thousand cores, and features fast inter-core communication, sharing of the programmable L2 cache, and so on.


The Open Petascale Libraries Project has been initiated by Fujitsu Laboratories of Europe (FLE) to address this issue. It is a global collaboration that aims to promote the development of open-source thread-parallel and hybrid numerical libraries. My talk introduces the project, provides an update on progress, and seeks to obtain feedback from the wider community on future directions. At this time, the project includes: dense linear algebra, sparse solvers and adaptive meshing, Fast Fourier Transforms, and random number generators. In particular, I am interested in the development of highly scalable parallel random number generators, and a wider use of the Monte Carlo methods on petascale systems for various application areas.


Further information on the OPL Project is available at http://www.openpetascale.org/.


Back to Session IV

Future Exascale systems, so what’s different?


Mark Seager

INTEL Corporation

Santa Clara, CA, USA


The challenges of Exascale have been discussed at length. Addressing the power and resiliency challenges require an aggressive near threshold voltage (NTV) circuit designs that actually make the resiliency problem worse. In this talk, I discuss a hierarchal approach to dealing with these issues and also the impacts on applications, algorithms, computation & communications methods and IO.


Back to Session V

Software Implications of New Exascale Technologies


Ravi Nair

IBM Thomas J. Watson Research Center

Yorktown Heights, New York, USA


Continuing on the high-end high-performance computing trajectory towards Exascale requires the overcoming of several obstacles. A lot of attention has been paid in the community to the hardware challenges arising principally from the slowing down of Dennard scaling. Several innovative approaches have been proposed to dealing with these challenges. However, most of these approaches only add to the software hurdles that already need to be overcome in order to make Exascale systems successful. This talk will provide examples of hardware innovations that have been proposed or would be needed to build an Exascale system and will describe new software challenges that these innovations would present.

Back to Session V

Achieving Scalability in the Presence of Asynchrony


Thomas Sterling, Ph.D

Professor of Informatics and Computing

Indiana University


The last 35 years of mainstream parallel computing have depended upon the assumption of synchronous operation; the expectation that the time measure of actions was knowable and exploitable in the management of physical resources and abstract actions. This was true with the architecture and programming methods for basic vector computing of the 1970’s, the SIMD Array processing systems of the 1980’s, and the communicating sequential processes based message passing programming of MPPs and commodity clusters of the 1990’s. This philosophy promoted explicit programmer specification of resource management and task scheduling with compile time assistance. Now in the Petaflops era with the inflation of number of cores (either multicore sockets or GPU structures), widely disparate latencies, and algorithms exhibiting increasingly irregular structures and time varying response times, asynchronous behavior is increasingly manifest in terms of degradation of efficiency and limitations to scalability. Combined with the effects of overhead in determining effective granularity, and therefore indirectly concurrency, these factors may demand a revolutionary change to dynamic adaptive strategies through the implementation and application of runtime system software as an intermediary to mitigate asynchrony; the independence and uncertainty of timing of execution events. This presentation will borrow from the experimental ParalleX execution model to consider a set of runtime mechanisms(some from prior art in computer science research) that address these interrelated challenges all contributing to growing asynchrony within high performance computing systems and their implications for future architectures and programming methods that will enable Exascale computing by the end of this decade.


Back to Session V

Adiabatic Quantum Computing


Bob Lucas

Computational Sciences Division

University of Southern California

Information Sciences Institute

Los Angeles, CA, USA


With the end of Dennard scaling, there appear to be three paths forward to greater computing capability: massive scaling of general purpose processors, purpose built systems, or pursuit of new physical phenomenon to exploit. Adiabatic quantum computing is an example of the latter. It is a new modeling of computing, first proposed in 2000. The University of Southern California and the Lockheed Martin Corporation have formed a joint Quantum Computing Center, and have taken delivery of such a system, produced by the D-Wave Systems Corporation. This talk will discuss adiabatic quantum computing as realized in the D-Wave One and give an overview of early research results from the USC-Lockheed Martin Quantum Computing Center.


Back to Session V

Exascale Design Space Exploration


Sudip Dosanjh

Extreme-scale Computing

Sandia National Laboratories


The U.S. Department of Energy's mission needs in energy, national security and science require a thousand-fold increase in supercomputing technology during the next decade. It will not be possible to build a usable exascale system within an affordable power budget based on computer industry roadmaps. Both architectures and applications will need to change dramatically. Although exascale is an important driver, these changes will impact all scales of computing from single nodes to racks to supercomputers. The entire computing industry faces the same power, memory, concurrency and programmability challenges. Exascale computing has additional challenges, notably scalability and reliability, that are related to the extreme size of systems of interest.


In order to influence the design of future systems we must partner with computer companies and application developers to explore the design space. Benefits of proposed changes must be quantified relative to costs. Costs could be related to energy and silicon area as well as software development. In order for computer companies to adopt changes the benefits must be quantified with trusted and validated models across a broad range of applications. The wider this range the easier it will be to leverage industry roadmaps. In the past it has been difficult to perform design tradeoff studies due to the lack of validated simulation/emulation tools and the complexity of HPC applications, which can be millions of lines of code.


Our proposed methodology for design space exploration is to use multi-scale architectural simulation coupled with mini- and skeleton- applications to analyze a range of abstract machine models. Close collaboration with application teams will be needed to enable the reformulation of key algorithms that accommodate machine constraints.

Back to Session V

The EU Exascale Project DEEP - Towards a Dynamical Exascale Entry Platform


Thomas Lippert

Institute for Advanced Simulation, Jülich Supercomputing Centre

and University of Wuppertal, Computational Theoretical Physics,

and John von Neumann Institute for Computing (NIC)

also Europen PRACE IP Projects and of the DEEP Exascale Project



Since begin of 2012 a consortium of 16 partners from 8 countries led by the Jülich Supercomputing Centre, among them 5 industrial partners, is engaged in developing the novel hybrid supercomputing system DEEP. The DEEP project is funded by the European Community under FP7-ICT-2011-7 as Integrated Project, with co-funding by the partners. The DEEP concepts foresees a standard cluster computer component complemented by a cluster of accelerator cards, called booster. DEEP is an experiment with the aim to adapt the hardware architecture to the hierarchy of different concurrency levels of application codes. Due to the cluster-booster concept, for a given code, cluster as well as booster resources can be assigned to different parts of the code in a dynamical manner, optimizing scalability. This is achieved through an adaption of the cluster operating software ParaStation (ParTec) along with the parallel programming environment OmpSS (BSC). The major challenge for the concept is to achieve a proper and most efficient interaction between cluster and booster while minimizing the communication between both parts. Moreover, it is the combination of Intel's Many Core Integrated Architecture (MIC, Intel Braunschweig) and the EXTOLL communication system (Uni Heidelberg) that allows to boot the booster cards without additional processor and promises unprecedented performance, scalability as well as energy efficiency of the booster system. Energy efficiency is further improved through hot water cooling technology (LRZ, EuroTech). Six European partners contribute with porting of their applications that all exhibit more than one concurrency level and are expected to require Exascale performance in the future.


Back to Session V

Hybrid system architecture and application


Yutong Lu

School of Computer Science

National University of Defence Technology

Changsha, Hunan Province, CHINA


With more and more Petaflops systems deployed, many debates come from how we could use them efficiently. This talk introduces the efforts on hybridarchitecture and software of Tianhe-1A to address the performance, scalability and reliability issues. In additional, the update applications running on the Tianhe-1A will be introduced to analyses the usability and feasibility of the hybrid system.The brief prospect of the next generation HPC system will also be given.


Back to Session V

Extreme Scale Computational Science Challenges in Fusion Energy Research


William M. Tang

Princeton University, Princeton Plasma Physics Laboratory, USA


Advanced computing is generally recognized to be an increasingly vital tool for accelerating progress in scientific research in the 21st Century. The imperative is to translate the combination of the rapid advances in super-computing power together with the emergence of effective new algorithms and computational methodologies to help enable corresponding increases in the physics fidelity and the performance of the scientific codes used to model complex physical systems. If properly validated against experimental measurements and verified with mathematical tests and computational benchmarks, these codes can provide reliable predictive capability for the behavior of fusion energy relevant high temperature plasmas. The fusion energy research community has made excellent progress in developing advanced codes for which computer run-time and problem size scale well with the number of processors on massively parallel supercomputers. A good example is the effective usage of the full power of modern leadership class computational platforms from the terascale to the petascale and beyond to produce nonlinear particle-in-cell simulations which have accelerated progress in understanding the nature of plasma turbulence in magnetically-confined high temperature plasmas. Illustrative results provide great encouragement for being able to include increasingly realistic dynamics in extreme-scale computing campaigns to enable predictive simulations with unprecedented physics fidelity. Some key aspects of application issues for extreme scale computing will be included within this brief overview of computational science challenges in the Fusion Energy Sciences area.


Back to Session VI

Achieving the 20MW Target: Energy Efficiency for Exascale


Natalie Bates

Energy Efficient HPC Working Group

Lawrence Berkeley National Laboratory

Berkeley, CA, USA


The growth rate in energy consumed by data centers in the United States has been declining in the past five years compared to its earlier accelerating pace. This reduced growth rate was achieved in large part due to energy efficiency improvements. Measuring, monitoring and managing usage has been key to making these improvements in energy efficiency. The metric Power Usage Effectiveness (PUE) has been effective in driving the energy efficiency of data centers, but it has limitations. PUE does not account for the power distribution and cooling losses inside the IT equipment, which is particularly problematic for HPC. Similarly, reporting performance and analyzing the amount of power used to run High Performance Linpack for a Top500 and/or Green500 submission has been successful in helping to drive improvements in supercomputing system energy efficiency. Power efficiency, (Megaflops per watt) shows average efficiency nearly tripling between 2007 and 2011. But just as PUE isn’t perfect for the data center, so are there problems with the power/energy measurement methodologies, workloads and metrics for supercomputer systems. Work is actively being done on both of these topics. Performance analysis capabilities and tools have been honed over the past decades and today we must develop the same capabilities and tools for energy and power analysis. The ability to achieve the 20MW target for an Exascale system is challenging and will require shifts in architecture, technology and application usage models as well as tighter coupling between the data center infrastructure and the computer system. This talk will describe current work and industry trends as well as layout a future roadmap for energy and power measurement capability, methodologies and metrics that will help us hit our 20 MW target.


Back to Session VI

Cloud Adoption Issues: Interoperability and Security


Vladimir Getov

School of Electronics and Computer Science

University of Westminster, London, U.K.


The concept of a hybrid cloud is an attractive one for many organisations, allowing an organisation with an existing private cloud to partner with a public cloud provider. This can be a valuable resource as it allows end users to keep some of their operation in-house, but benefit from the scalability and on-demand nature of the public cloud. There are, however, a number of issues that organisations must consider before opting for a hybrid cloud set-up. The single most pressing issue that must be addressed is that, by definition, the hybrid cloud is never ‘yours’ – part of it is owned or operated by a third party, which can lead to security concerns. With a true ‘private cloud’ – hosted entirely on your own premises, then the security concerns are no different to those associated with any other complex distributed system.

Indeed, ‘Cloud computing’ as a term has become very overloaded – it is doubtful whether this type of internal private cloud system qualifies as cloud computing at all, as it does not bring the core benefits associated with cloud computing, including taking the pressure off in-house IT resources and providing a quickly scalable “elastic” solution using the new pay-as-you-go business model. However, when this ‘private cloud’ is hosted by a third party, the security issues facing end users become more complex. Although this cloud is in theory, still private, the fact that it relies on external resources means that IT Managers are no longer in sole control of their data. Security remains a major adoption concern, as many service providers put the burden of cloud security on the customer, leading some to explore costly ideas like third party insurance. It is a huge risk, as well as impractical, to ignore the high potential risk from losing expensive and/or sensitive data. Another issue that organisations must consider is interoperability – internal and external systems must work together before security issues can be considered.

It could be said, therefore, that a true hybrid cloud is actually quite difficult to achieve, when interoperability and security issues are considered. One solution might be a regulatory framework that would allow cloud subscribers to undergo a risk assessment prior to data migration, helping to make service providers accountable and provide transparency and assurance. Concerns with hybrid cloud are indicative of the anxiety that many companies feel when considering cloud computing as a viable business option. We need to see a global consensus on regulation and standards to increase trust in this technology and lower the risks that many organisations feel goes hand-in-hand with entrusting key data or processing capabilities to third parties. Once this hurdle is removed then the true benefits of cloud computing can finally be realised.


Back to Session VII

Qos-Aware Management of Cloud Applications


Patrick Martin

School of Computing, Queen’s University

Kingston, Ontario, Canada


Many organizations are considering moving their applications and data to a cloud environment in order to take advantage of its flexibility and potential cost savings. There are numerous challenges associated with making this move including selecting a cloud service provider, deploying and provisioning an application in the cloud to meet required QoS levels, monitoring application performance and dynamically re-provisioning as demand fluctuates in order to maintain QoS commitments and minimize costs.

In the talk I will propose a framework for QoS-aware management of cloud applications to address these challenges. I will discuss the structure of the framework and highlight the key research questions that must be answered in order to develop the framework.


Back to Session VII

Automatic IaaS Elasticity for the PaaS Cloud of the Future


Jose Luis Vazquez-Poletti

Dpt. de Arquitectura de Computadores y Automática

Universidad Complutense de Madrid



Cloud computing is essentially changing the way services are built, provided and consumed. Despite simple access to Clouds, building elastic services is still an elitist domain and proprietary technologies are an entry barrier especially to SMEs and consequently, it remains largely within the domain of established players. The 4CaaSt project (http://4CaaSt.eu/) aims to create an advanced PaaS Cloud platform which supports the optimized and elastic hosting of Internet-scale multi-tier applications. 4CaaSt embeds all the necessary features, easing programming of rich applications and enabling the creation of a true business ecosystem where applications coming from different providers can be tailored to different users, mashed up and traded together. This talk will describe the research efforts, involving SLA and Admission Control policy management, and technology enhancements in OpenNebula, involving support for vertical and horizontal service scalability, to address the requirements of elasticity in the project. We will use a computing provider scenario, offering HPC clusters as a service, to illustrate the benefits of the approach.


Back to Session VII

Stratosphere - data management on the cloud


Odej Kao

Complex and Distributed IT Systems

Technische Universitat

Berlin, Germany


Data Intensive Scalable Computing is a much investigated topic in current research. Next to parallel databases, new flavors of data processors have established themselves - most prominently the map/reduce programming and execution model. The new systems provide key features that current parallel databases lack, such as flexibility in the data models, the ability to parallelize custom functions, and fault tolerance that enables them to scale out to thousands of machines.

In this talk, we will present the Nephele system – an execution engine for massive-parallel virtualized environments centered around a programming model of so called Parallelization Contracts (PACTs). Nephele is part of the large system Stratosphere, which is as generic as map/reduce systems, while overcoming several of their major weaknesses. The focus will be set on the underlying cloud model, the execution strategies, the detection of communication bottlenecks and network topology, and on light-weight fault tolerance methods.




Dr. Odej Kao is a Full Professor at the Technische Universität Berlin and head of the research group Complex and distributed IT systems. Moreover, he is the director of the IT service center of the TU Berlin called tubIT, which offers IT services to more than 40000 members of the TU Berlin. Finally, he is scientific advisor for distributed architectures at the Fraunhofer Institute for Software Technique and Computer Architecture FIRST.


Dr. Kao is a graduate from the Technische Universität Clausthal, where he earned a Master’s degree in Computer Science and Electrical Engineering in 1995. Thereafter, he spent two years working on his PhD thesis dealing with high performance image processing and defended his dissertation in December 1997. In his work as PostDoc Dr. Kao published many papers on high performance multimedia retrieval and was awarded an advanced PhD (habilitation) in March 2002.


In April 2002 Dr. Kao joined the University of Paderborn as Associated Professor for distributed and operating systems. One year later he became a managing director of the Paderborn Center for Parallel Computing (PC2) where he has conducted research and many industry-relevant projects in resource management and Grid computing. He was technical manager of the EU-project HPC4U dealing with QoS fault tolerance for Grid Infrastructures and coordinator of the large project AssessGrid for risk assessment and management in complex on-demand computing environments. Finally, in August 2006 he moved to the TU Berlin.


Since 1998, he has published over 220 peer-reviewed papers at prestigious scientific conferences and journals. Dr. Kao is member of many international program committees and editorial boards of Journals such as Parallel Computing. His research interests include Cloud computing, Virtualisation, data and resource management, Quality of Service and SLAs, identity management, and peer2peer based resource description and discovery.


Back to Session VII

A Cloud Framework for Knowledge Discovery Workflows on Azure


Domenico Talia

Dept. of Electronics, Informatics and Systems

University of Calabria


Cloud platforms provide scalable processing and data storage and access services that can be effectively exploited for implementing high-performance knowledge

discovery systems and applications. We designed a Cloud framework that supports the composition and scalable execution of knowledge discovery applications on the Windows Azure platform. Here we describe the system architecture, its implementation, and current work aimed at supporting the design and execution of knowledge discovery applications modeled as workflows.

Back to Session VII

Executing Multi-workflow simulations on a mixed grid/cloud infrastructure using the SHIWA Technology


Peter Kacsuk


Budapest, Hungary


Various scientific communities use different kind of scientific workflow systems that can run workflows on a specific DCI (Distributed Computing Infrastructure). The problem with the current workflow usage scenario is that user communities are locked in their workflow system, i.e. they cannot share their workflows with scientists using in the same field but selected a different workflow system. They are also locked into the DCI that is supported by the selected workflow system, i.e., they cannot run their workflow application in another DCI that is not supported by the selected workflow system. The SHIWA technology enables to avoid these pitfalls and makes it possible to share workflows written in various workflow languages among different user communities. It also enables the creation of so-called meta-workflows that combine workflow applications into a higher level workflow system. The other important feature of the SHIWA technology is the support of multi-DCI execution of these meta-workflows both on various grids and cloud systems. The talk wil describe in detail how such meta-workflows can be created and executed on a mixed grid/cloud infrastructure.


Back to Session VII

Open-source platform-as-a-service: requirements and implementation challenges


Dana Petcu

West University of Timisoara and Institute e-Austria Timisoara, Romania


While at the infrastructure (asa) service level the adoption of emerging standards is slowly progressing as solution for interoperability in agreed or adhoc federation of Clouds, the market of platforms (asa) services is still struggling with the variety of proprietary offers and approaches, leading the application developers to a vendor lockin.

Opensource platforms that are currently emerging as middleware build on top of multiple Clouds have a high potential to help the development of applications that are vendor agnostic and a click away from the Clouds, and, by this, to boost the migration towards the Clouds. Due to the complexity of such platforms the number of existing solutions is currently low. We will present a short analysis of the available implementations, including VMware’s Cloud Foundry or Red Hat’s OpenShift, as well as with a special focus on mOSAIC’s platform [1].

While fulfilling the user requirements, the platform needs also to automate the processes running on the providers’ sites. In this context, special components to be developed when implementing an opensource platform are related to the main characteristics of the Cloud, like elasticity (through autoscaling mechanisms for example) or high availability (through adaptive scheduling for example). The requirements of autoscaling and adaptive scheduling in the case of using services from multiple Clouds will be discussed and the recent approaches exposed in [2,3] will be detailed.


[1] mOSAIC Consortium. Project details at http://www.mosaiccloud.eu. Platform implementation at https://bitbucket.org/mosaic. Documentation at: http://developers.mosaiccloud.eu.

[2] N.M. Calcavecchia, B.A.Caprarescu, E. Di Nitto, D. J. Dubois, D. Petcu, DEPAS: A Decentralized Probabilistic Algorithm for AutoScaling, http://arxiv.org/abs/1202.2509, 2012

[3] M. Frincu, N. Villegas, D. Petcu, H.A. Mueller, R. Rouvoy, SelfHealing Distributed Scheduling Platform, Procs. 11th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGrid'11), IEEE Computer Press, 225 234


Back to Session VII

Building Secure and Transparent Inter-Cloud Infrastructure for Scientific Applications


Yoshio Tanaka

National Institute of Advanced Industrial Science and Technology (AIST)


On 11 March 2011 Japan suffered a major earthquake and resulting tsunami that caused great loss of life and devastated many buildings, industries, and information services. Recent flooding in Thailand has resulted in destruction of property and a disruption of services. Thus, it is prudent for information technology infrastructure and services providers to formulate and implement procedures for rapid disaster recovery of key services which become even more critical during catastrophic events. We have developed and deployed GEO Grid applications on the distributed infrastructure which consists of clusters contributed by our international colleagues. Secure and transparent Inter-Cloud infrastructure was realized by Cloud interoperation and network virtualization. Cloud interoperation enables sharing of virtual machine images by private and public Clouds, i.e. a virtual machine image could be shared by different VM hosting environments including OpenNebula, Rocks, and Amazon EC2. Insights gained through the preliminary experiments indicated the key issue for Inter-Cloud is how we could use technologies for network virtualization. We used OpenFlow for network virtualization to build a secure (isolated) network for Inter-Cloud.

In this presentation, I’ll talk about our experiences on building secure and transparent Inter-Cloud infrastructure for scientific applications. Current status and future issues will be presented as well.


Back to Session VIII

Scientific Data Analysis on Cloud and HPC Platforms


Judy Qiu

School of Informatics and Computing


Pervasive Technology Institute

Indiana University


We are in the era of data deluge and future success in science depends on the ability to leverage and utilize large-scale data. Systems such as MapReduce have been applied to a wide range of “big data” applications and the open-source Hadoop system has increasingly been adopted by researchers of HPC, Grid and Cloud community. These applications include pleasingly parallel applications and many loosely coupled data mining and data analysis problems where we will use genomics, information retrieval and particle physics as examples. We will introduce the key features of Hadoop and Twister (MapReduce variant). Then, we will discuss important issues of interoperability between HPC and commercial clouds and reproducibility using cloud computing environments.


Back to Session VIII

The suitability of BSP/CGM model for HPC on Clouds


Alfredo Goldman

Department of Computer Science

University of São Paulo

São Paulo, Brazil


Nowadays the concepts and infrastructures of Cloud Computing are becoming a standard for several applications. Scalability is not only a buzzword anymore, but is being used effectively. However, despite the economical advantages of virtualization and scalability, some factors as latency, bandwidth and processor sharing can be a problem for doing High Performance Computing on the Cloud.


We will provide an overview on how to tackle these problems using the BSP (Bulk Synchronous Parallel). We will also introduce the main advantages of CGM (Coarse Grained Model), where the main goal is to minimize the number of communication rounds, which can have an important impact on BSP algorithms performance. We will also present our experience on using BSP in an opportunistic grid computing environment.  Then we will show several recent models for distributed computing initiatives based on BSP. We will also provide some research directions to improve the performance of BSP applications on Clouds.

Finally we will present some preliminary experiments comparing the performance of BSP and MapReduce model.


Back to Session VIII

Big Data Analytics for Science Discovery


Valerio Pascucci

Director, Center for Extreme Data Management Analysis and Visualization (CEDMAV)

Associate Director, Scientific Computing and Imaging Institute

Professor, School of Computing, University of Utah

Laboratory Fellow, Pacific Northwest National Laboratory

CTO, ViSUS Inc. (visus.us)


Advanced techniques for analyzing and understanding Big Data models are a crucial ingredient for the success of any supercomputing center and data intensive scientific investigation. Such techniques involve a number of major challenges such as developing scalable algorithms that run efficiently on the simulation data generated on the largest supercomputers in the world or incorporating robust methods are provably correct and complete in their extraction of features from the data.

In this talk, I will present the application of a discrete topological framework for the representation and analysis of large scale scientific data. Due to the combinatorial nature of this framework, we can implement the core constructs of Morse theory without the approximations and instabilities of classical numerical techniques. The inherent robustness of the combinatorial algorithms allows us to address the high complexity of the feature extraction problem for high resolution scientific data.

Our approach has enabled the successful quantitative analysis for several massively parallel simulations including the study turbulent hydrodynamic instabilities, porous material under stress and failure, the energy transport of eddies in ocean data used for climate modeling, and lifted flames that lead to clean energy production.

During the talk, I will provide a live demonstration of some software tools for topological analysis of large scale scientific data and discuss the evolution of the organization of the project, highlighting key aspects that enabled us to successfully deploy this new family of tools to scientists in several disciplines.



Valerio Pascucci is the funding Director, Center for Extreme Data Management Analysis and Visualization (CEDMAV), recently established as a permanent organization at the University of Utah in collaboration with the Pacific Northwest National Laboratory. Valerio is also an Associate Director, Scientific Computing and Imaging Institute, a Professor, School of Computing, University of Utah, and a Laboratory Fellow, of PNNL. Before joining the University of Utah, Valerio was the Data Analysis Group Leader of the Center for Applied Scientific Computing at Lawrence Livermore National Laboratory, and Adjunct Professor of Computer Science at the University of California Davis. Valerio's research interests include Big Data management and analytics, progressive multi-resolution techniques in scientific visualization, discrete topology, geometric compression, computer graphics, computational geometry, geometric programming, and solid modeling. Valerio is the coauthor of more than one hundred refereed journal and conference papers and has been an Associate Editor of the IEEE Transactions on Visualization and Computer Graphics.


Back to Session IX

EUDAT - European scientists and data centers turn to big data collaboration


Wolfgang Gentzsch

Advisor, EUDAT


EUDAT is a pan-European big data project, bringing together a unique consortium of research communities and national data and high performance omputing centers, aiming to contribute to the production of a collaborative data infrastructure to support Europe’s scientific and research data requirements.

The aim of this talk is to highlight the main objectives of the EUDAT project and its Collaborative Data Infrastructure, and to discuss a set of cross-disciplinary data services designed to service all European research communities, such as safe replication of data sets among different sites, data staging to compute facilities, easy data storage, metadata, single sign-on, and persistent identifiers.


Back to Session IX

Smart Cities and Opportunities for Convergence of Open Data and Computational Modeling


Charlie Catlett

Argonne National Laboratory and The University of Chicago


The increasing scale of new urban infrastructure projects and the accelerating rate of demand for such projects bring into focus several opportunities, indeed mandates, to harness information technologies that have not been traditionally applied to urban design, development, and evaluation. Architectural planning tools in use today rely on simplified models, typically lacking adequate treatment of complexity, underlying physical processes, or socio-economic factors which are at the heart of stated city objectives such as "safe," "harmonious," or "sustainable." To date these objectives have been difficult to measure due to lack of data, however the trend toward transparency and public access to "open data" is already enabling interdisciplinary scientific analysis and performance prediction at unprecedented detail. Our experience with cities over the past century suggests that the traditional approach of simplified models, combined with heuristics, often produces unintended results that are manifest only after they are difficult, or impractical, to unravel. Embracing open data and computational modeling into urban planning and design has the potential to radically shorten this experience loop, reducing risk while also allowing for innovation that would be otherwise impractical.


Back to Session IX

Discovering Knowledge from Massive Social Networks and Science Data - ¬Next Frontier for HPC


Prof. Alok N. Choudhary

John G. Searle Professor

Electrical Engineering and Computer Science

Northwestern University


Knowledge discovery in science and engineering has been driven by theory, experiments and more recently by large-scale simulations suing high-performance computers. Modern experiments and simulations involving satellites, telescopes, high-throughput instruments, imaging devices, sensor networks, accelerators, and supercomputers yield massive amounts of data. At the same time, the world, including social communities is creating massive amounts of data at an astonishing pace. Just consider Facebook, Google, Articles, Papers, Images, Videos and others. But, even more complex is the network that connects the creators of data. There is knowledge to be discovered in both. This represents a significant and interesting challenge for HPC and opens opportunities for accelerating knowledge discovery.


In this talk, followed by an introduction to high-end data mining and the basic knowledge discovery paradigm, we present the process, challenges and potential for this approach. We will present many case examples, results and future directions including (1) mining sentiments from massive datasets on the web, (2) Real-time stream mining of text from millions of and tweets to identify influencers and sentiments of people; (3) Discovering knowledge from massive social networks containing millions of nodes and hundreds of billions of edges from real world Facebook, twitter and other social network data (E.g., Can anyone follow Presidential campaigns and real-time?) and (4) Discovering knowledge from massive datasets from science applications including climate, medicine, biology and sensors.




Alok Choudhary is a John G. Searle Professor of Electrical Engineering and Computer Science at Northwestern University. He is the founding director of the Center for Ultra-scale Computing and Information Security (CUCIS) <http://cucis.ece.northwestern.edu>.

He received the National Science Foundation's Young Investigator Award in 1993. He has also received an IEEE Engineering Foundation award, an IBM Faculty Development award, an Intel Research Council award.

He is a fellow of IEEE, ACM and AAAS. His research interests are in high-performance computing, data intensive computing, scalable data mining, computer architecture, high-performance I/O systems and software and their applications. Alok Choudhary has published more than 350 papers in various journals and conferences and has graduated 30 PhD students.

Techniques developed by his group can be found on every modern processor and scalable software developed by his group can be found on most supercomputers.


Back to Session IX

High Performance Computing Challenges from an Aerospace Perspective


Greg Tallant


Lockheed Martin is a world leader in the design development, and integration of large complex systems. In this role Lockheed Martin is involved in many areas of technology that have an extremely broad range of computational challenges. These challenges range from engineering problems like computational fluid dynamics (CFD) and structural analysis to real time signal processing and embedded systems control. One of the biggest challenges currently being faced by Lockheed Martin is developing more affordable system solutions that are insensitive to increases in system complexity. To address this challenge, Lockheed Martin has assembled a team spanning our business areas and the academic community to explore quantum computing resources for application to new systems engineering capabilities and reduced costs for software development and testing. The objective of our effort is to develop a system-level verification & validation (V&V) approach and enabling tools that generate probabilistic measures of correctness for an entire large-scale cyber-physical system, where V&V costs are insensitive to system complexity. In this presentation we will provide an overview of our current research and present some of the initial results obtained to date.


Back to Session X

Macro-scale phenomena of arterial coupled cells: a Massively Parallel simulation


Timothy David

Centre for Bioengineering

University of Canterbury

New Zealand


Impaired mass transfer characteristics of blood borne vasoactive species such as ATP in regions such as an arterial bifurcation have been hypothesized as a prospective mechanism in the etiology of atherosclerotic lesions. Arterial endothelial (EC) and smooth muscle cells (SMC) respond differentially to altered local hemodynamics and produce coordinated macro-scale responses via intercellular communication. Using a computationally designed arterial segment comprising large populations of mathematically modelled coupled ECs \& SMCs, we investigate their response to spatial gradients of blood borne agonist concentrations and the effect of the micro-scale driven perturbation on a macro-scale. Altering homocellular (between same cell type) and heterocellular (between different cell types) intercellular coupling we simulated four cases of normal and pathological arterial segments experiencing an identical gradient in the concentration of the agonist. Results show that the heterocellular calcium coupling between ECs and SMCs is important in eliciting a rapid response when the vessel segment is stimulated by the agonist gradient. In the absence of heterocellular coupling, homocellular calcium coupling between smooth muscle cells is necessary for propagation of calcium waves from downstream to upstream cells axially. Desynchronized intracellular calcium oscillations in coupled smooth muscle cells are mandatory for this propagation. Upon decoupling the heterocellular membrane potential, the arterial segment looses the inhibitory effect of endothelial cells on the calcium dynamics of underlying smooth muscle cells. The full system comprising hundreds of thousands of coupled nonlinear ordinary differential equations simulated on the massively parallel Blue Gene architecture. The use of massively parallel computational architectures shows the capability of this approach to address macro-scale phenomena driven by elementary micro-scale components of the system.


Back to Session X

Overcoming Communication Latency Barriers in Massively Parallel Molecular Dynamics Simulation on Anton


Ron Dror

D. E. Shaw Research


Strong scaling of scientific applications on parallel architectures is increasingly limited by communication latency.  This talk will describe the techniques used to reduce latency and mitigate its effects on performance in Anton, a massively parallel special-purpose machine that accelerates molecular dynamics (MD) simulations by orders of magnitude compared with the previous state of the art.  Achieving this speedup required both specialized hardware mechanisms and a restructuring of the application software to reduce network latency, sender and receiver overhead, and synchronization costs.  Key elements of Anton’s approach, in addition to tightly integrated communication hardware, include formulating data transfer in terms of counted remote writes and leveraging fine-grained communication.  Anton delivers end-to-end inter-node latency significantly lower than any other large-scale parallel machine, and the total critical-path communication time for an Anton MD simulation is less than 3% that of the next-fastest MD platform.

Back to Session X

Job scheduling of parametric computational mechanics studies on cloud computing infrastructure


Carlos García Garino a,b , Cristian Mateos c,d and Elina Pacini a


a Information & Communication Technologies Institute (ITIC) and

b School of Engineering, National University of Cuyo, Mendoza, Argentina


cISISTAN Institute, UNICEN University, Tandil, Buenos Aires, Argentina


d Consejo Nacional de Investigaciones Científicas y Técnicas (CONICET)


Parameter Sweep Experiments (PSEs) allow scientists and engineers to conduct experiments by running the same program code against different input data. In particular non linear Computational Mechanical Problems are addressed in this case [1] This usually results in many jobs with high computational requirements. Thus, distributed environments, particularly Clouds, can be employed to fulfill these demands. However, job scheduling is challenging as it is an NP-complete problem. Recently, Cloud schedulers based on bio-inspired techniques–which work well in approximating problems– have been reviewed [2]. Sadly, existing proposals ignore job priority, which is a very important aspect in PSEs since it allows accelerating PSE results processing and visualization in scientific Clouds.

In this work a new Cloud scheduler based on Ant Colony Optimization, the most popular bio-inspired technique, which also exploits well known notions from operating systems theory is proposed. Simulated experiments performed with real PSE job data and other Cloud scheduling policies indicate that this new proposal allows for a more agile job handling while reducing PSE completion time.


[1] Pacini, E., Ribero, M., Mateos, C., Mirasso, A., García Garino, C.: Simulation on cloud computing infrastructures of parametric studies of nonlinear solids problems. In: F. V. Cipolla-Ficarra et al. (ed.) Advances in New Technologies, Interactive Interfaces and Communicability (ADNTIIC 2011). pp. 56–68. Lecture Notes in Computer Science (2011), to appear.

[2] Pacini, E., Mateos, C., García Garino, C.: Schedulers based on Ant Colony Optimization for Parameter Sweep Experiments in Distributed Environments. in S. Bhattacharyya and P. Dutta (Editors), Handbook of Research on Computational Intelligence for Engineering, Science and Business. IGI Global, 2012. In press.


Back to Session X

Multi-Resolution StreamDescription: visus-bgls of Big Scientific Data: Scaling Visualization Tools from Handheld Devices to In-Situ Processing


Valerio Pascucci

Director, Center for Extreme Data Management Analysis and Visualization (CEDMAV)

Associate Director, Scientific Computing and Imaging Institute

Professor, School of Computing, University of Utah

Laboratory Fellow, Pacific Northwest National Laboratory

CTO, ViSUS Inc. (visus.us)


Effective use of data management techniques for massive scientific data is a crucial ingredient for the success of any supercomputing center and data intensive scientific investigation. Developing such techniques involves a number of major challenges such as the real-time management of massive data, or the quantitative analysis of scientific features of unprecedented complexity. Addressing these challenges requires interdisciplinary research in diverse topics including the mathematical foundations of data representations, the design of robust, efficient algorithms, and the integration with relevant applications in physics, biology, or medicine.

 In this talk, I will present a scalable approach for processing large scale scientific data with high performance selective queries on multiple terabytes of raw data. The combination of this data model with progressive streaming techniques allows achieving interactive processing rates on a variety of computing devices ranging from handheld devices like an iPhone, to simple workstations, to the I/O of parallel supercomputers. I will demonstrate how our system has enabled the real time streaming of massive combustion simulations from DOE platforms such as Hopper2 at LBNL and Intrepid at ANL.

During the talk, I will provide a live demonstration of the effectiveness of some software tools developed in this effort and discuss the deployment strategies in an increasing heterogeneous computing environment.



Valerio Pascucci is the funding Director, Center for Extreme Data Management Analysis and Visualization (CEDMAV), recently established as a permanent organization at the University of Utah in collaboration with the Pacific Northwest National Laboratory. Valerio is also an Associate Director, Scientific Computing and Imaging Institute, a Professor, School of Computing, University of Utah, and a Laboratory Fellow, of PNNL. Before joining the University of Utah, Valerio was the Data Analysis Group Leader of the Center for Applied Scientific Computing at Lawrence Livermore National Laboratory, and Adjunct Professor of Computer Science at the University of California Davis. Valerio's research interests include Big Data management and analytics, progressive multi-resolution techniques in scientific visualization, discrete topology, geometric compression, computer graphics, computational geometry, geometric programming, and solid modeling. Valerio is the coauthor of more than one hundred refereed journal and conference papers and has been an Associate Editor of the IEEE Transactions on Visualization and Computer Graphics.


Back to Session X

Portability and Interoperability in Clouds: Agents, Semantic and Volunteer computing can help - the mOSAIC and Cloud@Home projects


Beniamino Di Martino

Second University of Naples - mOSAIC Project Coordinator


Cloud vendor lock-in and interoperability gaps arise (among many reasons) when

semantics of resources and services, and of Application Programming Interfaces is not shared.

Standards and techniques borrowed from SOA and Semantic Web Services areas might help in gaining shared, machine readable description of Cloud offerings (resources, Services at Platform and Application level, and their API groundings), thus allowing automatic discovery, matchmaking, and thus supporting selection, brokering, interoperability end even composition of Cloud Services among multiple Clouds.

The EU funded mOSAIC project (http://www.mosaic-cloud.eu) aims at designing and developing an innovative open-source API and platform that enables applications to be Cloud providers' neutral and to negotiate Cloud services as requested by their users. Using the mOSAIC Cloud ontology and Semantic Engine, cloud applications' developers will be able to specify their services and resources requirements and communicate them to the mOSAIC Platform and Cloud Agency.

The mOSAIC Cloud Agency will implement a multi-agent brokering mechanism that will search for Cloud services matching the applications’ request, and possibly compose the requested service.

The PRIN (National Relevance Research Project) Prject Cloud@Home (http://cloudathome.unime.it/) aims at implementing a volunteer Cloud, a paradigm which mixes aspects of both Cloud and Volunteer computing. The main enhancement of Cloud@Home is the capability of a host to be at the same time both contributing and consumer host, establishing a symbiotic interaction with the Cloud@Home environment.


Back to Session XI

Smart Sensing for Discovering and Reducing Energy Wastes in Office Buildings


Amy Wang

Institute for Interdisciplinary Information Sciences

Tsinghua University

Beijing, CHINA


Recent survey shows that in our offices up to 70% of computers and related equipments are left on all the time. Equipment energy costs can be reduced by 20% just by turning off when not in use. However, it is very challenging to develop an automatic control system to discover and reduce the energy wastes. Particularly, to discover the energy wastes, the running states of the massive appliances need to be tracked in real-time and checked against the real-time user requirements to judge whether an electrical appliance is wasting energy or not. Because the electrical appliances are massive and the user requirements are highly dynamic, it is generally very difficult and cost inefficient to track the states of the electrical appliances and the real-time user requirements. In this talk, we report how the recent advantages of smart metering and compressive sensing technologies can be exploited to solve above challenging problems. Although the real-time electrical appliance states and the real-time user requirements compose very high dimensional dynamic signals, they are converted to sparse signals by temporal and spatial transformations respectively. Compressive sensing systems by smart meters and infrared sensors are designed to track these sparsified signals using lightweight metering and sequential decoding. Particularly in this talk, the design methodologies, theoretical bounds and experimental results will be introduced.


Back to Session XI

Project ADVANCE: Ant Colony Optimisation (ACO) using coordination programming based on S-Net


Alex Shafarenko

Department of Computer Science

University of Hertfordshire

Hatfield, UK


This talk presents some of the results of the EU Framework 7 project ADVANCE. We report our experiences of applying an HPC structuring technique: dataflow

coordination programming, and the specific programming environment: the language S-Net, to restructuring existing numerical code developed by SAP AG.

The code implements an ACO solution to the Travelling Salesman Problem. We have converted the ACO algorithm to a stream-processing network

and encoded it as a coordination program. We then implemented this solution by using either explicit thread management in C (a manually coded version)

or by applying our coordination compiler, and compared the results. We find that the use of S-Net results in a low code-development cost while achieving

the same scaling characteristics and very similar performance compared to the manually coded solution at large system sizes. The message-driven (as opposed

to message-passing) nature of the coordinating streaming code creates the prerequisites for a large scale distributed, but still easily manageable and maintainable

implementation. We argue that it is that maintainability and manageability that makes our approach uniquely suitable for industrial uptake of HPC.


Back to Session XI