AISep 26, 2025

Ground-Truthing AI Energy Consumption: Validating CodeCarbon Against External Measurements

arXiv:2509.22092v112 citationsh-index: 1
Originality Incremental advance
AI Analysis

It addresses the need for reliable energy consumption quantification in AI to support sustainable development, though it is incremental in validating existing tools.

This study evaluated the accuracy of AI energy estimation tools like CodeCarbon by comparing them with ground-truth measurements across hundreds of experiments, finding errors of up to 40%.

Although machine learning (ML) and artificial intelligence (AI) present fascinating opportunities for innovation, their rapid development is also significantly impacting our environment. In response to growing resource-awareness in the field, quantification tools such as the ML Emissions Calculator and CodeCarbon were developed to estimate the energy consumption and carbon emissions of running AI models. They are easy to incorporate into AI projects, however also make pragmatic assumptions and neglect important factors, raising the question of estimation accuracy. This study systematically evaluates the reliability of static and dynamic energy estimation approaches through comparisons with ground-truth measurements across hundreds of AI experiments. Based on the proposed validation framework, investigative insights into AI energy demand and estimation inaccuracies are provided. While generally following the patterns of AI energy consumption, the established estimation approaches are shown to consistently make errors of up to 40%. By providing empirical evidence on energy estimation quality and errors, this study establishes transparency and validates widely used tools for sustainable AI development. It moreover formulates guidelines for improving the state-of-the-art and offers code for extending the validation to other domains and tools, thus making important contributions to resource-aware ML and AI sustainability research.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes