AILGNov 5, 2025

From Prompts to Power: Measuring the Energy Footprint of LLM Inference

arXiv:2511.05597v16 citationsh-index: 22Has Code
Originality Incremental advance
AI Analysis

This addresses the environmental impact of generative AI for developers and users, but is incremental as it builds on existing measurement approaches.

The study tackled the problem of high energy consumption in Large Language Model (LLM) inference by conducting a large-scale measurement-based analysis across 21 GPU configurations and 155 model architectures, resulting in a predictive model that accurately estimates energy usage and a browser extension to raise awareness.

The rapid expansion of Large Language Models (LLMs) has introduced unprecedented energy demands, extending beyond training to large-scale inference workloads that often dominate total lifecycle consumption. Deploying these models requires energy-intensive GPU infrastructure, and in some cases has even prompted plans to power data centers with nuclear energy. Despite this growing relevance, systematic analyses of inference energy consumption remain limited. In this work, we present a large-scale measurement-based study comprising over 32,500 measurements across 21 GPU configurations and 155 model architectures, from small open-source models to frontier systems. Using the vLLM inference engine, we quantify energy usage at the prompt level and identify how architectural and operational factors shape energy demand. Building on these insights, we develop a predictive model that accurately estimates inference energy consumption across unseen architectures and hardware, and implement it as a browser extension to raise awareness of the environmental impact of generative AI.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes