Harald Köstler

2papers

2 Papers

20.8CEMar 18
Automated Grammar-based Algebraic Multigrid Design With Evolutionary Algorithms

Dinesh Parthasarathy, Wayne Mitchell, Arjun Gambhir et al.

Although multigrid is asymptotically optimal for solving many important partial differential equations, its efficiency relies heavily on the careful selection of the individual algorithmic components. In contrast to recent approaches that can optimize certain multigrid components using deep learning techniques, we adopt a complementary strategy, employing evolutionary algorithms to construct efficient multigrid cycles from proven algorithmic building blocks. Here, we will present its application to generate efficient algebraic multigrid methods with so-called \emph{flexible cycling}, that is, level-specific smoothing sequences and non-recursive cycling patterns. The search space with such non-standard cycles is intractable to navigate manually, and is generated using genetic programming (GP) guided by context-free grammars. Numerical experiments with the linear algebra library, \emph{hypre}, demonstrate the potential of these non-standard GP cycles to improve multigrid performance both as a solver and a preconditioner.

11.5PFMar 17
AI Application Benchmarking: Power-Aware Performance Analysis for Vision and Language Models

Martin Mayr, Sebastian Wind, Lukas Schröder et al.

Artificial Intelligence (AI) workloads drive a rapid expansion of high-performance computing (HPC) infrastructures and increase their power and energy demands towards a critical level. AI benchmarks representing state-of-the art workloads and their understanding in the context of performance-energy trade-offs are critical to deploy efficient infrastructures and can guide energy efficiency measures, such as power capping. We introduce a benchmarking framework with popular deep learning applications from computer vision (image classification and generation) and large language models (continued pre-training and inference) implementing modern methods. Our performance analysis focuses on throughput rather than time to "completion", which is the standard metric in HPC. We analyse performance and energy efficiency under various power capping scenarios on NVIDIA H100, NVIDIA H200, and AMD MI300X GPUs. Our results reveal that no universal optimal power cap exists, as the efficiency peak varies across application types and GPU architectures. Interestingly, the two NVIDIA GPUs which mainly differ in their HBM configuration show qualitatively different performance-energy trade-offs. The developed benchmarking framework will be released as a public tool.