A robust methodology for long-term sustainability evaluation of Machine Learning models
This work addresses the problem of inefficient and non-standardized sustainability assessments for AI systems, which is incremental by proposing a new evaluation protocol.
The paper tackles the lack of standardized evaluation protocols for assessing the long-term sustainability of machine learning models, showing that traditional static evaluations fail to capture sustainability under evolving data and that higher environmental cost often yields little performance benefit.
Sustainability and efficiency have become essential considerations in the development and deployment of Artificial Intelligence systems, yet existing regulatory and reporting practices lack standardized, model-agnostic evaluation protocols. Current assessments often measure only short-term experimental resource usage and disproportionately emphasize batch learning settings, failing to reflect real-world, long-term AI lifecycles. In this work, we propose a comprehensive evaluation protocol for assessing the long-term sustainability of ML models, applicable to both batch and streaming learning scenarios. Through experiments on diverse classification tasks using a range of model types, we demonstrate that traditional static train-test evaluations do not reliably capture sustainability under evolving data and repeated model updates. Our results show that long-term sustainability varies significantly across models, and in many cases, higher environmental cost yields little performance benefit.