Skip to main content
internal slider 1

Structure

internal slider 1

Structure

The project activities are organised around four technical and two cross-cutting work packages (WPs), the latter being dedicated to the dissemination and training, and to the overall project and risk management.

Training will focus on filling the above mentioned gap between science and hardware specific code levels by employing the newly developed DSL toolchain to key algorithmic motives and applying it to achieve performance portability of novel mathematical concepts as developed in ESCAPE-2.

The work package interactions are illustrated in the figure to the right, with WP1 providing a diverse range of relevant algorithmic motives (weather and climate dwarfs) for defining and working with a comprehensive domain-specific language (DSL) toolchain (WP2). The subsequent integration in benchmarks for weather and climate prediction, and the progressive development of a domain-specific HPCW benchmark as well as back-integration of the output of the DSL toolchain into models is done in WP3. VVUQ concepts to both weather and climate dwarfs as well as full prediction system workloads are applied in WP4. Dissemination and training (WP5) will encompass elements from all technical work packages. 

structure

Aim: WP1 will develop mathematical methods and implement advanced and disruptive algorithms suitable for extreme-scale parallelism that achieve major improvements in the accuracy, efficiency, fault-tolerance, and scalability of dynamical cores and of physical parametrizations for next-generation weather and climate prediction models. Moreover, WP1 will extract and provide a range of relevant algorithmic motifs (weather and climate dwarfs) as a prerequisite for other work packages. These will include key algorithms of advection, time-stepping methodologies, and of physical parametrizations, representative for leading European weather and climate models.

Approach and methodology: A significant contribution to this work package will build on weather and climate dwarfs developed in ESCAPE. In order to widen the spectrum of algorithmic concepts used in Earth system modelling, contributions will be extracted from the ocean models NEMO and ICON-ocean, and the radiation physical parametrizations in the context of Artificial Neural Networks (ANN). Additionally, newly developed dwarfs will cover a semi-Lagrangian higher-order, DG-approach and a fault-tolerant implementation of an iterative elliptic solver.

Suitability of the research approach: WP1 will cover a wide range of relevant algorithms used in weather and climate sciences. It facilitates the comprehensive definition of a user-friendly and widely applicable DSL toolchain as well as the definition of highly-relevant benchmarks for vendors and HPC hardware developers. 

Measures for Success of the Work Package/ KPIs:  

  • Weather and climate dwarfs delivered, covering at least eight different algorithmic motifs (dwarfs).
  • Development of a large time-step, highly-scalable, higher-order method.
  • Proof-of-concept of the ANN approach.

Aim: WP2 will define, develop, and apply a DSL toolchain applicable to a comprehensive list of weather and climate dwarfs. The code adaptation and code generation via the DSL toolchain will be demonstrated for a number of representative and fundamentally different mathematical algorithms and horizontal discretizations. Moreover, WP2 will develop and promote APIs and generic interfaces across the DSL toolchain in order to improve reusability and inter-operability, and leverage code adaptation to emerging HPC architectures.

Approach and methodology: WP2 will build heavily on developments and expertise from the ESCAPE project and leverage existing open-source technology. Dwarfs are used as vehicles to design, prototype and demonstrate the newly developed DSL toolchain, in particular in the design of the DSL front-end. Following the development of a high-level DSL specification involving community-wide experts in the process, a DSL front-end to parse and translate into high-level intermediate representation (HIR) is designed. Subsequent parts of the modular DSL toolchain will be developed based on existing, open-source technologies such as Atlas, GridTools, and clang. Finally, the DSL toolchain will be applied to a wide range of weather and climate dwarfs.

Suitability of the research approach: Community-wide involvement in the iterative development of a high-level DSL frontend language (including the associated design decisions) ensures a comprehensive coverage of requirements and is key to achieving broad support and adoption. Building on existing, open-source technologies and integration with existing DSL efforts in Switzerland will ensure exploitable results within the scope of the ESCAPE-2 project.

Measures for Success of the Work Package/ KPIs:

  • DSL implementation of at least five dwarfs with increased readability and equal or better performance as compared to their reference implementation on any of the three targeted hardware architectures (x86 multi-core, NVIDIA Tesla, Intel Xeon Phi).

Aim: WP3 will develop a hierarchy of benchmarking components representing the key elements in the workflow of weather and climate prediction systems and re-integrate and test code adaptations generated from the DSL toolchain. This work will establish a representative High Performance Climate and Weather benchmark (HPCW). HPCW will serve as a benchmark for (pre)-exascale applications of climate and weather codes and will facilitate communication with HPC hardware developers and vendors. The value of HPCW will be demonstrated using the range of available hardware architectures.

Approach and methodology: WP3 will define the HPCW based on representative Earth system models from which key dwarfs will have been extracted (in WP1). Moreover, WP3 will ensure reliable and automatic verification through developing routines that check the correctness of the benchmark execution when different software implementations or different hardware options are explored. Several known approaches will be implemented and evaluated following the methodological developments in WP1 and WP2, and also producing an evaluation option in the VVUQ framework of WP4. The HPCW benchmark establishes a comprehensive set of test cases and models featuring a number of representative algorithmic motives, as well as system-sized workloads. The workload simulator Kronos (developed by the NextGenIO project) will be employed to create realistic operational scenarios for executing multiple workloads within a single benchmarking environment. This level of benchmarking is entirely new, and it will allow exploring the effect of complex resource contention not observable if single workloads are executed in isolation.

Suitability of the research approach: WP3 closely involves HPC centres and the leading European infrastructure vendor to ensure the suitability of the benchmark design as both user and vendor requirements with respect to HPC benchmark use and relevance will be addressed. In addition, leading European models provide a comprehensive suite of current and future requirements.

Measures for Success of the Work Package/ KPIs:

  • Number of selected algorithmic motives (dwarfs) for which successful back-integration of DSL-toolchain generated code of in models can be demonstrated.
  • Number of delivered components of the HPCW benchmark.
  • Number of performance analyses (per pair benchmark code/hardware system).
  • Number of downloads of HPCW components from portal.

Aim: WP4 will develop a generic European VVUQ package for weather and climate simulations that is deployable on supercomputers and that prepares workloads of pre-exascale computations on many-core configurations. The VVUQ package will be demonstrated for both dwarf and full forecasting system workloads, and scenarios will be explored with optimized case performance based on the available VVUQ methodologies. WP4 aims at confronting ensemble-based and other methodologies to improve VVUQ practices and to produce a generic VVUQ framework for climate simulation at the European level. 

Approach and methodology: WP4 builds on the URANIE project that has been started at CEA to understand and quantify uncertainties in numerical simulations. Although URANIE was originally developed to focus on one of the four main CEA areas of research and development, namely the nuclear energy and modelling of multi-physics phenomena in a nuclear plant, its scope of application has been broadened as soon as it became an OpenGL platform and has widened its user community beyond the historical partners. The URANIE platform will be enhanced to capitalise and disseminate the approaches learned from the weather and climate community to other science disciplines and use cases. The weather and climate prediction community has a long experience with the quantification of uncertainties through ensemble methods. Ensemble Prediction methods are targeted in WP4 by running several instances of the same simulation with modified initial conditions, and to analyse the produced results. The work will start from a small, but representative configuration and target a full, more realistic system. In order to provide a VVUQ software ready for deployment, first developments and tests will be performed on the CEA and BSC HPC clusters. The final version, dealing with a production-level ensemble weather prediction model will be deployed on the ECMWF cluster and potentially other platforms.

Suitability of the research approach: A two-way continuous exchange between both energy and weather and climate prediction communities is anticipated to benefit from their respective expertise, fostering substantial cross-disciplinary exchange of ideas and methodologies. The result is anticipated to benefit other research communities.

Measures for Success of the Work Package/ KPIs:

  • Providing a VVUQ package for the weather and climate prediction, and energy community with at least two successful deployments (and executions of the full-sized system) at computing clusters, demonstrating the usability and generality of selected VVUQ concepts across different HPC systems.

Aim: Prompting a paradigm shift in the understanding and use of DSLs and their impact on weather and climate model design and application, WP5 will focus on training, support and dissemination activities. In addition, WP5 will actively promote the dissemination and use of weather and climate dwarfs and the application of the HPCW benchmark in co-design, model development, as well as in training and education. WP6 coordinates the project and ensures that its innovation actions, objectives and impacts will be delivered.

Approach and methodology: WP5 will provide the public web portal, confluence interactive development pages for remote working and exchange between partners, and the provision of a common software development and exchange platform suitable for rapid deployment and developments in a distributed environment. A particular focus of WP5 is to ensure adequate training and dissemination of the novel concepts. Training of early-career scientists through use of novel concepts will foster community acceptance and showcase the achievable acceleration of developer productivity when applying the DSL toolchain developed in ESCAPE-2. 

The WP6 management structures will coordinate and ensure to:

  • set-up and maintain a structure, procedures and tools that will allow a coherent and efficient technical and administrative management of the project;
  • keep the project on time and within the assigned budget;
  • identify and manage risks and solve problems;
  • identify opportunities for improved results and collaboration;
  • coordinate the interactions between work packages and partners;
  • provide and manage working procedures ensuring transparency within the team and for the EC;
  • manage quality assurance.

Suitability of the research approach: The common software development platform based on Atlassian tools hosted at ECMWF has proven successful in the project ESCAPE and will be continued in this project. The additional benefit derives from the continued support and promotion of ESCAPE outcomes together with the exchange and novel mathematical and algorithmic development of ESCAPE-2 in a common development environment. The early exposure of the novel development concepts based on a DSL toolchain that generates code is crucial for domain scientists, both experts and early career. This will provide important feedback that can influence design choices thus avoiding costly redesign at a later project stage.

Measures for Success of the Work Package/ KPIs:

  • Successful provision of all relevant communication tools between project partners, stakeholders and the public.
  • Successful interaction with domain-scientists and early career researchers in the form of at least two dissemination and discussion workshops and 1 summer school.
Project Number: 800897
Topic: FETHPC-02-2017 - Transition to Exascale Computing
Start Date: 1st October 2018
Duration: 36 months
EU Funding: EUR 3,999,650.00