Sergey Lesnik

64.8CEMar 16Code

SCALE-TRACK: Asynchronous Euler-Lagrange particle tracking on heterogeneous computing architecture

Silvio Schmalfuß, Sergey Lesnik, Henrik Rusche et al.

Euler-Lagrange (EL) simulations provide a direct and robust framework for modeling disperse multiphase flows. However, they are computationally expensive. While various approaches have attempted to leverage heterogeneous computing architectures, they have encountered scalability limitations. We present SCALE-TRACK, a scalable two-way coupled EL particle tracking algorithm, designed to exploit heterogeneous exascale computing environments. With asynchronous coupling, cache-friendly data structures, and chunk-based partitioning, we address key limitations of existing EL implementations. Validations against an analytical solution and a conventional EL implementation demonstrate the accuracy of the proposed algorithms. On a local workstation, we simulated 1.4 billion particles in a test case featuring a single graphics processing unit (GPU). Scaling runs on an HPC (high-performance computing) cluster show excellent strong and weak scaling, with up to 256 billion particles being tracked on up to 256 GPUs. This represents a significant advancement for EL simulations, enabling high-fidelity simulations on local workstations and pushing the limits on HPC systems. The software is released as open source and is publicly available.

35.5DCMar 29

The First OpenFOAM HPC Challenge (OHC-1)

Sergey Lesnik, Gregor Olenik, Mark Wassermann

The first OpenFOAM HPC Challenge (OHC-1) was organised by the OpenFOAM HPC Technical Committee (HPCTC) to collect a snapshot of OpenFOAM's computational performance on contemporary production hardware and to compare hardware-constrained submissions with software-track optimisations. Participants ran a common incompressible steady-state RANS case, the open-closed cooling DrivAer (occDrivAer) configuration, on prescribed meshes, submitting either with the reference setup (hardware track) or with modified solvers, decomposition strategies, or accelerator offloading (software track). In total, 237 valid datapoints were submitted by 12 contributors: 175 in the hardware track and 62 in the software track. The hardware track covered 25 distinct CPU models across AMD, Intel, and ARM families, with runs spanning from single-node configurations up to 256 nodes (32768 CPU cores). Wall-clock times ranged from 7.8 minutes to 65.7 hours and reported energy-to-solution from 2.1 to 236.9 kWh. Analysis of the hardware track identified a Pareto front of optimal balance between time- and energy-to-solution, and revealed that on-package high-bandwidth memory (HBM) dominates single-node performance for next-generation CPUs. Software-track submissions achieved up to 28% lower energy per iteration, 17% higher maximum performance per node, and 72% shorter minimum time per iteration than the best hardware-track results, with full GPU ports and selective-memory optimisations leading the performance range. This manuscript describes the challenge organisation, the case setup and metrics, and presents the main findings from both tracks together with an outlook for future challenges.

Sergey Lesnik

2 Papers