DC AIApr 30

AI Inference as Relocatable Electricity Demand: A Latency-Constrained Energy-Geography Framework

arXiv:2604.2785544.9

Predicted impact top 35% in DC · last 90 daysOriginality Synthesis-oriented

AI Analysis

For operators of large-scale AI inference systems, this framework quantifies the trade-offs between latency and energy cost/carbon emissions, enabling more informed placement decisions.

This paper develops an energy-geography framework for geo-distributed AI inference, modeling inference placement as a constrained optimization problem. Simulations show that latency relaxation expands feasible geography, but migration frictions, egress costs, and capacity limits can sharply reduce realized benefits.

AI inference is becoming a persistent and geographically distributed source of electricity demand. Unlike many traditional electrical loads, inference workloads can sometimes be executed away from the user-facing service location, provided that latency, state locality, capacity, and regulatory constraints remain acceptable. This paper studies when such digital relocation of computation can be interpreted as latency-constrained relocation of electricity demand. We develop an energy-geography framework for geo-distributed AI inference. The framework models a three-layer architecture of clients, service nodes, and compute nodes, and formulates inference placement as a constrained optimization problem over electricity prices, marginal carbon intensity, power usage effectiveness, compute capacity, network latency, and migration frictions. The key object is the energy-latency frontier: the marginal cost and carbon benefit unlocked by relaxing inference latency budgets. The paper makes four contributions. First, it distinguishes physical electricity transmission from digital relocation of electricity-consuming computation. Second, it formulates a geo-distributed inference placement model with feasibility masks and migration frictions. Third, it introduces operational metrics, including relocatable inference demand, energy return on latency, carbon return on latency, and a relocation break-even condition. Fourth, it provides a transparent stylized simulation over representative global compute regions to show how heterogeneous latency tolerance separates workloads into local, regional, and energy-oriented execution layers. The results show that latency relaxation expands feasible geography, while migration frictions, egress costs, state locality, legal constraints, and capacity limits can sharply reduce realized benefits.

View on arXiv PDF

Similar