1st International Workshop on Strategic Development of High Performance Computers

 

The purpose of this workshop is to report Japan’s research activities for post petaflops, or exascale, machines, and also to discuss international system software development collaborations. In 2012, Japan started a two-year program to investigate feasibility on development of post K computer. Four research teams have been involved with this program. A team headed by AICS is responsible for studying scientific and social issues to be solved by 2020 and making a science road map until 2020. Three teams, headed by Tohoku U., U. of Tsukuba, and U. of Tokyo, have been working with NEC, Hitachi, and Fujitsu to study on three types of next supercomputers.
The US DOE (Department of Energy) and Japan MEXT (Ministry of Education, Culture, Sports, Science and Technology) have agreed to pursue cooperation between the U.S. and Japan on system software for post-petascale computing, including collaborative R&D and international standardization of system software. They organized an SC12 BOF session to discuss this matter and concluded that draft plans for collaboration, coordination, and management are discussed in this workshop.

Date: March 18-19, 2013
Venue: Tsukuba International Congress Center (Convention Hall 200), Tsukuba Japan
http://www.epochal.or.jp/eng/index.html
Registration: http://goo.gl/STIgj(by March 7, 2013)

  • Registration Fee: Free
  • Option: Banquet JPY5,000
Co-sponsored by:
  • Cyberscience Center, Tohoku University
  • Center for Computational Sciences, University of Tsukuba
  • Information Technology Center, The University of Tokyo
  • Global Scientific Information and Computing Center, Tokyo Institute of Technology
  • Academic Center for Computing and Media Studies, Kyoto University
  • Information Technology Research Institute, AIST
  • RIKEN Advanced Institute for Computational Science
  • Development of System Software Technologies for post-Peta Scale High
    Performance Computing, Basic Research Programs: CREST, JST
Supported by:
  • High Performance Computing Infrastructure Consortium
Organizing Co-Chairs:
  • Masaaki Kondo (UEC Tokyo)
  • Naoya Maruyama (RIKEN)
  • Yutaka Ishikawa (University of Tokyo)
Contact: sdhpc[AT]hpc.is.uec.ac.jp

Final Program:

Monday, March 18

09:30-12:00 International Collaboration Planning (This session is invited only)
09:30-09:45 Welcome and Introduction
09:45-12:00 Panel Discussion
Moderator:
Peter Beckman (ANL)
Panelist:
Robert Ross(ANL), Naoya Maruyama (Riken AICS), Masaaki Kondo (U. of Electro-Communications),
Takeshi Nanri (Kyushu U.), Toshio Endo (Titech), Atsushi Hori (Riken AICS),
Yutaka Ishikawa (U. of Tokyo/Riken AICS), Mitsuhisa Sato (U. of Tsukuba)
12:00-13:00 Registration
13:00-13:05 Opening Remarks
Yutaka Ishikawa (University of Tokyo)
13:05-13:45 Welcome Address
William Harrod (DOE), Takahiro Hayashi (MEXT)
13:45-14:30 [Invited Talk] Resilience and Reliability for Future Systems [slide]
Bronis de Supinski (LLNL)
14:30-15:15 [Invited Talk] Performance Analysis and Optimization of Scientific Applications on Extreme-Scale
Computer Systems
[slide]
Bernd Mohr (JSC)
15:15-15:30 BREAK
15:30-16:30 Report on Feasibility Study on Future HPC Infrastructure I [slide]
Hirofumi Tomita (RIKEN)
16:30-17:30 Report on Feasibility Study on Future HPC Infrastructure II [slide]
Mitsuhisa Sato (University of Tsukuba)
18:00-20:00 Banquet at INCA ROSE (Fee: JPY5,000)
http://www.inca-rose.jp/contents/information.html

 

Tuesday, March 19

09:00-10:00 Report on Feasibility Study on Future HPC Infrastructure III [slide]
Hiroaki Kobayashi (Tohoku University)
10:00-10:15 BREAK
10:15-11:00 [Invited Talk] Automated Exploration of the HPC Co-Design Space
Jeffrey Vetter (ORNL)
11:00-11:45 [Invited Talk] The DEEP Project
Norbert Eicker (JSC)
11:45-13:30 LUNCH
13:30-14:15 [Invited Talk] Big Data and Exascale Systems — Challenges and Opportunities [slide]
Alok Choudhary (Northwestern University)
14:15-15:15 Report on Feasibility Study on Future HPC Infrastructure IV [slide]
Yutaka Ishikawa (University of Tokyo)
15:15-15:30 BREAK
15:30-16:00 Report on draft plan of international collaboration
Peter Beckman (ANL), Yutaka Ishikawa (University of Tokyo)
16:00-16:10 Closing

Invited Talks:

Title: Resilience and Reliability for Future Systems

Speaker: Bronis Supinski (LLNL)
Abstract: Applications running on current systems already incur
substantial costs to handle faults. The size of the systems is growing
dramatically – our Sequoia system has over 1.5 million cores and 6
million hardware threads while exascale systems are projected to have
concurrency levels of at least a half billion. The increasing
component counts and decreasing feature sizes that will support these
levels of concurrency imply that system mean time to interrupt (MTTI)
will decrease. Current applications primarily rely on
checkpoint/restart to handle faults. Whether more advanced strategies,
such as algorithm-based fault tolerance (ABFT), will be necessary for
future systems is a subject of considerable debate. We advocate a
strategy that combines improvements to checkpoint/restart techniques
with ABFT and advanced techniques to detect failures. In this talk, I
will summarize this strategy to provide for the resilience needs of
our applications and detail some recent advances that contribute to it.

Title: Performance Analysis and Optimization of Scientific Applications on Extreme-Scale Computer Systems

Speaker: Bernd Mohr (JSC)
Abstract: The number of processor cores available in
high-performance computing systems is steadily increasing. In the
November 2012 list of the TOP500 supercomputers, only three systems
have less than 4,096 processor cores and the average is almost 30,000
cores, which is an increase of 12,000 in just one year. Even the
median system size is already over 15,000 cores. While these machines
promise ever more compute power and memory capacity to tackle today’s
complex simulation problems, they force application developers to
greatly enhance the scalability of their codes to be able to exploit
it. To better support them in their porting and tuning process, many
parallel tools research groups have already started to work on scaling
their methods, techniques and tools to extreme processor counts. In
this talk, we survey existing performance analysis and optimization
tools covering both profiling and tracing techniques, report on our
experience in using them in extreme scaling environments, review
existing working and promising new methods and techniques, and discuss
strategies for addressing unsolved issues and problems. Special
emphasis is put on European projects and tools.

Title: Automated Exploration of the HPC Co-Design Space

Speaker: Jeffrey Vetter (ORNL)
Abstract: HPC architects and applications scientists often use
performance models to explore a multidimensional design space of
architectural characteristics, algorithm designs, and application
parameters. With traditional performance modeling tools, these
explorations forced users to first develop an ad-hoc performance
model, and then, repeatedly evaluate and analyze the model manually.
These manual investigations proved laborious and error prone. More
importantly, the complexity of this traditional process often forced
users to oversimplify their investigations.
To address this challenge of design space exploration, we
have recently extended our Aspen (Abstract Scalable Performance
Engineering Notation) performance modeling language with three new
language constructs: user-defined resources, parameter ranges, and a
collection of costs in the abstract machine model. Then, we use these
constructs to enable automated design space exploration via a
nonlinear optimization solver. We show how four interesting classes of
design space exploration scenarios can be derived from Aspen models
and formulated as pure nonlinear programs. The analysis tools are
demonstrated using examples based on Aspen models for a
three-dimensional Fast Fourier Transform, the CoMD molecular dynamics
proxy application, and the DARPA Streaming Sensor Challenge Problem.
Our results show that this approach can compose and solve arbitrary
performance modeling questions quickly and rigorously when compared to
the traditional manual approach.

Title: The DEEP Project

Speaker: Norbert Eicker (JSC)
Abstract: Cluster computers are dominating high performance
computing today. Basically, such machines for massively parallel
processing are set up from commodity building blocks. The success of
this architecture is based on the fact that it profits from the
improvements provided by mainstream computing well known under the
label of Moore’s law.
With multi-Petaflop systems in production today the next goal
in HPC is to reach Exascale (1018 operations per second) by the end
of the decade. Obviously, this target introduces new challenges. First
of all there are technological problems like energy efficiency or
resiliency to be overcome. Furthermore, it is questionable, if general
purpose CPUs will still be competitive from an energy efficiency point
of view with more specialized solutions like accelerators, namely
GPUs. The scalability of today’s systems is limited by the way
accelerators are employed. Therefore it will become a necessity to
review the idea of the cluster architecture in HPC in order to prolong
their success into the future.
The EU-project DEEP proposes an advanced architecture combining a
Cluster with a so called Booster element comprising of many-core CPUs
interconnected by a high performance fabric. It is aiming for an
actual implementation of this concept based on Intel’s XEON Phi
processor-architecture and the EXTOLL high-performance interconnect.
It is combined with an advanced software-stack required to operate and
use the Booster hardware. Besides the actual system-level layers of
software the latter includes forward-looking programming paradigms
that shall enable application-programmers to express the various
levels of scalability embedded in their problems in a straight-forward
and maintainable way.

Title: Big Data and Exascale Systems — Challenges and Opportunities

Speaker: Alok Choudhary (Northwestern University)
Abstract: The primary goal of exascale systems is to accelerate scientific
discoveries and engineering innovation, yet the impact of exascale will not
only be measured by how fast a single simulation is performed, but rather by
the speed and acceleration on overall knowledge discoveries. Modern experiments
and simulations involving satellites, telescopes, high-throughput instruments,
imaging devices, sensor networks, accelerators, and supercomputers yield
massive amounts of data. That is, processing, mining and analyzing this data
effectively and efficiently will be a critical component of the knowledge
discovery process, as we can no longer rely upon traditional ways of dealing
with the data due to its scale and speed. Interestingly, an exascale system
can be thought of another instrument generating “big data”. But, unlike most
other instruments, such a system also presents an opportunity for big data
analytics. Thus the fundamental question is what are the challenges and
opportunities for exascale systems to be an effective platform for not only
perform traditional simulations, but will also be suitable for data-intensive
and data driven computing to accelerate time to insights. This talk will address
these emerging challenges and opportunities.