DCPFMar 27

Hardware-Agnostic and Insightful Efficiency Metrics for Accelerated Systems: Definition and Implementation within TALP

arXiv:2603.2657634.1h-index: 12
Predicted impact top 48% in DC · last 90 daysOriginality Incremental advance
AI Analysis

This work addresses performance optimization for developers in high-performance computing using heterogeneous platforms, though it is incremental as it builds on an existing framework.

The authors tackled the challenge of performance analysis in heterogeneous HPC systems by extending the POP efficiency framework to include new metrics for host and device efficiency, implemented in the TALP module, which exposed inefficiencies in offloading, load balance, and orchestration in synthetic benchmarks and three production applications.

The increasing adoption of heterogeneous platforms that combine CPUs with accelerators such as GPUs in high-performance computing (HPC) introduces new challenges for performance analysis and optimization. Traditional efficiency metrics, such as those proposed by the Performance Optimization and Productivity (POP) Center of Excellence, were designed primarily for homogeneous CPU-based systems and therefore, do not capture the complex interactions between host and device resources. In this work, we extend the POP efficiency framework to heterogeneous architectures by introducing a new hierarchy of metrics that separately evaluate host and device efficiency. On the host side, we quantify the effectiveness of hybrid execution and offloading operations. On the device side, we propose a multiplicative hierarchy analogous to the host hierarchy and define its Parallel Efficiency branch. Beyond their definition and formulation, we present the implementation of these metrics in the TALP module of the DLB library. TALP is a lightweight monitoring library that provides measurements both post mortem and at runtime, with outputs available in textual and machine-readable formats. We validate the proposed framework through synthetic benchmarks and three production HPC applications, demonstrating how the metrics expose inefficiencies in offloading, load balance, and orchestration. Results show that the extended TALP metrics provide actionable insights to guide developers in optimizing heterogeneous HPC codes.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes