SEAISep 10, 2025

Benchmarking Energy Efficiency of Large Language Models Using vLLM

arXiv:2509.08867v11 citations
Originality Synthesis-oriented
AI Analysis

This work addresses the need for developers to build more sustainable AI systems by providing better energy efficiency insights, though it is incremental as it builds on existing benchmarking efforts.

The paper tackles the lack of realistic energy efficiency benchmarks for Large Language Models (LLMs) by introducing the LLM Efficiency Benchmark using vLLM to simulate production scenarios, showing that factors like model size and concurrent requests affect inference energy efficiency.

The prevalence of Large Language Models (LLMs) is having an growing impact on the climate due to the substantial energy required for their deployment and use. To create awareness for developers who are implementing LLMs in their products, there is a strong need to collect more information about the energy efficiency of LLMs. While existing research has evaluated the energy efficiency of various models, these benchmarks often fall short of representing realistic production scenarios. In this paper, we introduce the LLM Efficiency Benchmark, designed to simulate real-world usage conditions. Our benchmark utilizes vLLM, a high-throughput, production-ready LLM serving backend that optimizes model performance and efficiency. We examine how factors such as model size, architecture, and concurrent request volume affect inference energy efficiency. Our findings demonstrate that it is possible to create energy efficiency benchmarks that better reflect practical deployment conditions, providing valuable insights for developers aiming to build more sustainable AI systems.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes