CompPow: A Case for Component-level GPU Power Management
For ML datacenter operators and GPU architects, this work addresses the problem of GPU power inefficiency by advocating for component-aware power management, though it is an incremental step rather than a breakthrough.
The paper proposes CompPow, a component-level power management approach for GPUs, demonstrating 10% higher energy efficiency and 5% improved performance across various ML operations.
The ever increasing demand for ML-driven intelligence in a wide spectrum of domains has led to ubiquity of GPUs. At the same time, GPUs are notorious for their power consumption needs and often dominate power allocation in a typical ML datacenter. While datacenter-level power optimizations which focus on collection of GPUs are promising, in this work, we take a different tack -- namely, we take a closer look at power consumption inside a GPU. Specifically, as modern GPUs are comprised of integrated components, we make a case for component-awareness, termed CompPow in this work, for improved power management in modern GPUs. We demonstrate for a variety of ML operations and execution patterns, CompPow has the potential to deliver higher energy efficiency (10%) and even improved performance (5%). We conclude with recommendations on how component-aware software-hardware co-design can extract additional energy efficiency from modern GPUs.