SE AIMay 23, 2025

Evaluating the Energy-Efficiency of the Code Generated by LLMs

Md Arman Islam, Devi Varaprasad Jonnala, Ritika Rekhi, Pratik Pokharel, Siddharth Cilamkoti, Asif Imran, Tevfik Kosar, Bekir Turkkan

arXiv:2505.20324v113.89 citationsh-index: 92025 3rd International Conference on Foundation and Large Language Models (FLLM)

Originality Synthesis-oriented

AI Analysis

It addresses the overlooked environmental impact of LLM-generated code for the software industry, though it is incremental as it focuses on benchmarking rather than proposing new methods.

This paper evaluates the energy efficiency of code generated by 20 popular LLMs for 878 programming problems, finding that LLM-produced solutions are often far less energy-efficient than human-written ones, with specific algorithmic groups consuming up to 450 times more energy.

As the quality of code generated by Large Language Models (LLMs) improves, their adoption in the software industry for automated code generation continues to grow. Researchers primarily focus on enhancing the functional correctness of the generated code while commonly overlooking its energy efficiency and environmental impact. This paper investigates the energy efficiency of the code generated by 20 popular LLMs for 878 programming problems of varying difficulty levels and diverse algorithmic categories selected from the LeetCode platform by comparing them against canonical human-written solutions. Although LLMs can produce functionally correct results in most cases, our findings show that the performance and energy efficiency of LLM-produced solutions are often far below those of human-written solutions. Among the studied LLMs, DeepSeek-v3 and GPT-4o generate the most energy-efficient code, whereas Grok-2 and Gemini-1.5-Pro are among the least energy-efficient models. On average, human-generated canonical solutions are approximately 1.17 times more energy efficient than DeepSeek-v3, 1.21 times more energy efficient than GPT-4o, and over 2 times more energy efficient than Grok-2 and Gemini-1.5-Pro. For specific algorithmic groups such as dynamic programming, backtracking, and bit manipulation, LLM-generated code can consume up to 450 times more energy than human-generated canonical solutions.

View on arXiv PDF

Similar