PFLGMay 15

Heuristic-Based Merging of HPC Traces to Extend Hardware Counter Coverage

arXiv:2605.1583226.1
Predicted impact top 29% in PF · last 90 daysOriginality Synthesis-oriented
AI Analysis

For HPC performance modelers, this method addresses the hardware counter limitation by merging traces, but it is an incremental improvement over existing multiplexing techniques.

This work proposes a heuristic-based method to merge execution traces from multiple HPC runs, each with different hardware counters, to create a unified dataset with extended counter coverage. The approach maintains acceptable accuracy and enables training ML models on a richer feature space without prior counter selection.

This work extends a framework for predicting the performance of High-Performance Computing (HPC) workloads using Machine Learning (ML). A common limitation in performance modeling is the restricted number of hardware counters that can be collected simultaneously. To address this, we propose a heuristic-based methodology to merge execution traces from multiple runs, each instrumented with a different set of hardware counters. Our approach matches computation bursts across executions by analyzing MPI structure, timing, and communication patterns. This process enables the construction of a unified dataset that includes a wider set of hardware features without relying on multiplexing. The output is a new synthetic trace with all merged counters, which can be used both for HPC performance prediction and for conventional performance analysis. The methodology has been validated on MareNostrum5 machine with a range of kernels and real applications. Results show that the merged counters maintain acceptable accuracy depending on the application, and can be directly used to train ML models on a richer feature space without prior counter selection.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes