CYMay 1

The Hidden Cost of Thinking: Energy Use and Environmental Impact of LMs Beyond Pretraining

arXiv:2605.0115898.6h-index: 28
AI Analysis

This work highlights the largely unreported environmental costs of post-training stages in language model development, which are growing rapidly and must be accounted for in environmental reporting standards.

This paper provides the first detailed breakdown of the environmental impact of a full language model development pipeline, including pretraining, supervised fine-tuning, preference optimization, and reinforcement learning, for Olmo 3 models. It finds that reasoning models are 17x more expensive to post-train than instruction-tuned models, development costs account for 82.2% of total compute, and the total process consumed ~12.3 GWh of energy, emitted 4,251 tCO2eq, and used 15,887 kL of water.

Modern language model development extends far beyond pretraining, yet environmental reporting remains narrowly focused on the cost of training a single final model. In this work, we provide the first detailed breakdown of the environmental impact of a full model development pipeline, from pretraining through supervised fine-tuning, preference optimization, and reinforcement learning, for Olmo 3, a family of 7 billion and 32 billion parameter models in both instruction-following and reasoning variants. We find that reasoning models are 17x more expensive to post-train than their instruction-tuned counterparts in terms of datacenter energy, driven by reinforcement learning rollout generation. Development costs (including experimentation, failed runs, and ablations) account for 82.2% of total compute, a roughly 65% increase over the ~50% reported for pretraining-focused pipelines in prior work. In total, we estimate our model development process consumed ~12.3 GWh of datacenter energy, emitted 4,251 tCO2eq, and consumed 15,887 kL of water, with water consumption driven entirely by power generation infrastructure rather than data center cooling. These costs, which are almost entirely unreported by model developers, are growing rapidly as post-training pipelines become more complex, and must be accounted for in environmental reporting standards and by the research community working to reduce AI's environmental impact.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes