The Numerical Engine Room Talks are an interdisciplinary seminar series at the intersection of numerical methods, algorithm development, and high-performance computing. A particular focus in these talks is on the internals of the "numerical engine room", i.e., on details such as implementation notes, performance numbers, or specific corner cases. We invite our guest speakers and the audience to engage in discussions at this level, to facilitate a scientific exchange that goes beyond the polished results found in the literature.
We host the Numerical Engine Room Talks on a semi-regular basis, approximately once every month or two, via Zoom. If you are interested, please send an email to Michael to be included in the regular announcement email.
Michael Schlottke-Lakemper, University of Augsburg
Gregor Gassner, University of Cologne
Hendrik Ranocha, Johannes Gutenberg University Mainz
ExaHyPE, a code sponsored by a former FET HPC EU project with the same name, was designed for exascale-ready solvers of hyperbolic PDEs on dynamically adaptive Cartesian meshes using ADER-DG. When the ExaHyPE project had finished, we finally had an idea what the ExaHyPE code should have looked like. Therefore, we rewrote the whole software stack from scratch: a new AMR baseline (Peano 4), a new PDE solver layer (ExaHyPE 2), and two new domain-specific extensions. ExaSeis 2 for seismology and ExaGRyPE for astrophysics, previously intermingled in ExaHyPE's core.
Some of these rewrites have been unavoidable. The most significant change of direction results from the rise of GPUs: The original ExaHyPE had been designed with KNLs in mind. With ExaHyPE 2, we had to redesign the code for the accelerators. Other decisions such as the sole commitment to TBB (we now support OpenMP, TBB, C++ and SYCL) or task-based parallelism only turned out to be too specific and ideologic to yield high performance on mainstream hardware.
This talk offers a high-level overview of ExaHyPE 2, motivated by astrophysical challenges simulating the merger of binary black holes that drove ExaGRyPE's development. Its second part dives into parallelisation techniques enabling seamless CPU/GPU scaling with minimal user effort. Upon request, we can provide anecdotal information on alternative approaches that we have tried out with previous software generations.
Three meta questions remain: Did we get the abstraction level right this time for extensibility and maintainability? Will we need to rewrite the code as we rerun on new machine architectures? Are we utilising the right external libraries and tools, or have we reinvented wheels and bet on the wrong horses?
TrixiLW.jl is a Lax-Wendroff (LW) PDE solver that accomplishes high-order accuracy by doing a coupled temporal and spatial discretization in contrast to the method of lines approach like that of Trixi.jl, where a multi-stage ODE solver is used for temporal discretization following the high-order spatial discretization. Lax-Wendroff (LW) schemes are friendly to the modern memory-bound CPU as they evolve to the next time level in a single stage, minimizing communication from RAM to cache in case of serial code and across nodes in case of a parallel code. Tenkai.jl is the Cartesian LW code used in our first and second paper, which took design and optimization inspiration from Trixi.jl. TrixiLW.jl was subsequently written to extend the LW discretization to curvilinear grids using Trixi.jl as a library. The talk will begin by briefly discussing the optimization lessons Tenkai.jl took from Trixi.jl and then focus on how we exploited Julia's multiple dispatch along with the modularity and generality of Trixi.jl to write TrixiLW.jl.
preCICE is an open-source coupling software for partitioned multi-physics and multi-scale simulations. Thanks to the software's library approach (the simulations call the coupling) and its high-level API, only minimally-invasive changes are required to prepare an existing (legacy) simulation software for coupling. Moreover, ready-to-use adapters for many popular simulation software packages are available, e.g. for OpenFOAM, SU2, CalculiX, FEniCS, and deal.II. For the actual coupling, preCICE offers methods for fixed-point acceleration (quasi-Newton acceleration), fully parallel communication (MPI or TCP/IP), data mapping (radial-basis function interpolation), and time interpolation (waveform relaxation). Today, although being an academic software project at heart, preCICE is used by more than 100 research groups in both academia and industry.
Automatic differentiation is a powerful technique in numerical computing. In this talk we will explore the tradeoffs of implementing AD on different levels, and discuss automatic differentiation for parallel programs and high-level programming languages. The particular focus will be on Enzyme an AD framework that operates on the compiler level and supports reverse and forward mode AD in a variety of languages such as Julia, C/C++, Fortran and others.
This talk is based on the paper "Scalable Automatic Differentiation of Multiple Parallel Paradigms through Compiler Augmentation" by Moses et al. (PDF, Winner of the Best Student Paper Award at SC22).
JAX-FLUIDS is a CFD solver written in Python, which uses the JAX framework to enable automatic differentiation (AD). This allows one to easily create applications for data-driven simulations or other optimization problems. The talk is based on the recent preprint "JAX-FLUIDS: A fully-differentiable high-order computational fluid dynamics solver for compressible two-phase flows" arXiv:2203.13760.