DCMar 19

From Servers to Sites: Compositional Power Trace Generation of LLM Inference for Infrastructure Planning

arXiv:2603.1838362.0h-index: 14
Predicted impact top 16% in DC · last 90 daysOriginality Incremental advance
AI Analysis

This work addresses the need for precise power modeling in infrastructure planning for datacenter operators and electrical utilities, enabling better provisioning and capacity planning, though it is incremental as it builds on existing trace-generation methods.

The paper tackled the problem of generating accurate power traces for LLM inference workloads in datacenters, which existing models fail to capture due to rapid GPU state transitions, and developed a compositional framework that achieves median absolute energy errors below 5% across various configurations while preserving temporal structure.

Datacenter operators and electrical utilities rely on power traces at different spatiotemporal scales. Operators use fine-grained traces for provisioning, facility management, and scheduling, while utilities use site-level load profiles for capacity and interconnection planning. Existing datacenter power models do not capture LLM inference workloads, in which GPUs shift rapidly among compute-intensive prefill, lower-power decode, and idle states, and facility demand depends on how these states evolve and synchronize across many devices. We show that LLM inference power can be represented compositionally through two components: workload-driven transitions among operating states and configuration-specific power distributions within those states. Building on this observation, we develop a trace-generation framework that learns from measured traces and synthesizes power profiles for new traffic conditions and serving configurations. These traces aggregate from GPU servers to rack-, row-, and facility-scale load profiles at the temporal granularity required by the study. Across multiple LLMs, tensor-parallel settings, and GPU generations, our framework achieves median absolute energy error below 5% for most configurations while preserving temporal autocorrelation structure. The resulting traces support downstream analyses including oversubscription, power modulation, and utility-facing load characterization, enabling infrastructure evaluations that flat nameplate assumptions and static trace replay cannot support.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes