Heuristic-Based Merging of HPC Traces to Extend Hardware Counter Coverage
For HPC performance modelers, this method addresses the hardware counter limitation by merging traces, but it is an incremental improvement over existing multiplexing techniques.
This work proposes a heuristic-based method to merge execution traces from multiple HPC runs, each with different hardware counters, to create a unified dataset with extended counter coverage. The approach maintains acceptable accuracy and enables training ML models on a richer feature space without prior counter selection.
This work extends a framework for predicting the performance of High-Performance Computing (HPC) workloads using Machine Learning (ML). A common limitation in performance modeling is the restricted number of hardware counters that can be collected simultaneously. To address this, we propose a heuristic-based methodology to merge execution traces from multiple runs, each instrumented with a different set of hardware counters. Our approach matches computation bursts across executions by analyzing MPI structure, timing, and communication patterns. This process enables the construction of a unified dataset that includes a wider set of hardware features without relying on multiplexing. The output is a new synthetic trace with all merged counters, which can be used both for HPC performance prediction and for conventional performance analysis. The methodology has been validated on MareNostrum5 machine with a range of kernels and real applications. Results show that the merged counters maintain acceptable accuracy depending on the application, and can be directly used to train ML models on a richer feature space without prior counter selection.