PFApr 13

Architectural Trade-offs in the Energy-Efficient Era: A Comparative Study of power-capping NVIDIA H100 and H200

Aditya Ujeniya, Jan Eitzinger, Georg Hager, Gerhard Wellein

arXiv:2604.1139127.4h-index: 39

AI Analysis

Provides guidance for selecting GPU architectures in energy-constrained environments, though findings are incremental.

This study compares power-capped performance of NVIDIA H100 and H200 GPUs, finding that H100 is slightly better for compute-bound workloads while H200 is more efficient for memory-bound tasks.

Modern NVIDIA GPUs like the H100 (HBM2e) and H200 (HBM3e) share similar compute characteristics but differ significantly in memory interface technology and bandwidth. By isolating memory bandwidth as a key variable, the power distribution between the memory and Streaming Multiprocessors (SM) changes notably between the two architectures. In the era of energy-efficient computing, analyzing how these hardware characteristics impact performance per watt is critical. This study investigates how the H100 and H200 manage memory power consumption at various power-cap levels. By a regression analysis, we study the memory power limit and uncover outliers consuming more memory power. To evaluate efficiency, we employ compute-bound (DGEMM) and memory-bound (TheBandwidthBenchmark) workloads, representing the two extremes of the Roof\-line model. Our observations indicate that across varying power caps, the H100 remains the slightly better choice for strictly compute-bound workloads, whereas the H200 demonstrates superior efficiency for memory-bound applications.

View on arXiv PDF

Similar