CYAIMay 14, 2025

How Hungry is AI? Benchmarking Energy, Water, and Carbon Footprint of LLM Inference

arXiv:2505.09598v5101 citationsh-index: 19
Originality Synthesis-oriented
AI Analysis

It addresses the growing environmental sustainability problem for AI developers and policymakers by providing a standardized benchmarking tool, though it is incremental in applying existing methods to new data.

This paper tackles the problem of quantifying the environmental impact of LLM inference by introducing a benchmarking framework that measures energy, water, and carbon footprints across 30 models, revealing that the most energy-intensive models exceed 29 Wh per prompt and scale to impacts comparable to 35,000 U.S. homes annually.

This paper introduces an infrastructure-aware benchmarking framework for quantifying the environmental footprint of LLM inference across 30 state-of-the-art models in commercial datacenters. The framework combines public API performance data with company-specific environmental multipliers and statistical inference of hardware configurations. We additionally utilize cross-efficiency Data Envelopment Analysis (DEA) to rank models by performance relative to environmental cost and provide a dynamically updated dashboard that visualizes model-level energy, water, and carbon metrics. Results show the most energy-intensive models exceed 29 Wh per long prompt, over 65 times the most efficient systems. Even a 0.42 Wh short query, when scaled to 700M queries/day, aggregates to annual electricity comparable to 35{,}000 U.S. homes, evaporative freshwater equal to the annual drinking needs of 1.2M people, and carbon emissions requiring a Chicago-sized forest to offset. These findings highlight a growing paradox: as AI becomes cheaper and faster, global adoption drives disproportionate resource consumption. Our methodology offers a standardized, empirically grounded basis for sustainability benchmarking and accountability in AI deployment.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes