Workload composition smooths aggregate power demand while sustaining short-horizon ramps in AI data centers
For grid operators and data center operators, this reveals that workload composition is a key factor in the grid impact of AI data centers, offering a potential lever for demand smoothing.
This paper shows that in shared-GPU AI data centers, the mix of batch and inference workloads can decouple aggregate power variability from short-horizon ramping, with variability becoming U-shaped and ramping hump-shaped as inference share increases. The mechanism is that queued batch jobs fill idle capacity from fluctuating inference demand, reducing variability but not ramping.
Artificial intelligence (AI) is driving rapid growth in electricity demand, yet the grid-facing power dynamics of AI data centers remain poorly understood. Here we show that, in shared-GPU systems, the composition of batch and inference workloads decouples aggregate power variability from short-horizon ramping. As the inference share rises, variability becomes U-shaped, whereas ramping becomes hump-shaped, particularly under higher loading. The magnitude and turning points of these patterns also depend on system loading. Using a trace-calibrated framework linking workload arrivals, queueing, scheduling, and GPU power, we show that the underlying mechanism is asymmetric. At intermediate workload mixes, queued batch jobs fill capacity left idle by fluctuating inference demand, reducing aggregate power variability. However, short-horizon ramping remains elevated because inference-side fluctuations propagate more directly into realized power. AI data centers should therefore be understood as dynamic systems whose workload composition shapes their grid impact.