exaCB: Reproducible Continuous Benchmark Collections at Scale Leveraging an Incremental Approach
This work addresses the problem of performance and energy efficiency evaluation for HPC application developers and system administrators, though it is incremental as it builds on existing CI/CD practices.
The paper tackles the challenge of systematic and reproducible performance evaluation in heterogeneous high-performance computing (HPC) systems transitioning to exascale architectures by introducing exaCB, a framework for continuous benchmarking that integrates into CI workflows, enabling benchmarking of over 70 applications with support for cross-application analysis and energy-aware studies.
The increasing heterogeneity of high-performance computing (HPC) systems and the transition to exascale architectures require systematic and reproducible performance evaluation across diverse workloads. While continuous integration (CI) ensures functional correctness in software engineering, performance and energy efficiency in HPC are typically evaluated outside CI workflows, motivating continuous benchmarking (CB) as a complementary approach. Integrating benchmarking into CI workflows enables reproducible evaluation, early detection of regressions, and continuous validation throughout the software development lifecycle. We present exaCB, a framework for continuous benchmarking developed in the context of the JUPITER exascale system. exaCB enables application teams to integrate benchmarking into their workflows while supporting large-scale, system-wide studies through reusable CI/CD components, established harnesses, and a shared reporting protocol. The framework supports incremental adoption, allowing benchmarks to be onboarded easily and to evolve from basic runnability to more advanced instrumentation and reproducibility. The approach is demonstrated in JUREAP, the early-access program for JUPITER, where exaCB enabled continuous benchmarking of over 70 applications at varying maturity levels, supporting cross-application analysis, performance tracking, and energy-aware studies. These results illustrate the practicality using exaCB for continuous benchmarking for exascale HPC systems across large, diverse collections of scientific applications.