Distributed LLM Pretraining During Renewable Curtailment Windows: A Feasibility Study
This work addresses the problem of high energy consumption and carbon emissions in LLM training by leveraging otherwise wasted renewable energy, offering a solution for organizations seeking more sustainable AI development.
This study investigates the feasibility of pretraining large language models (LLMs) by aligning training with renewable energy curtailment windows. The prototype system successfully trained a 561M-parameter transformer model across three geo-distributed GPU clusters, achieving a reduction in operational emissions to 5-12% of single-site baselines while maintaining training quality.
Training large language models (LLMs) requires substantial compute and energy. At the same time, renewable energy sources regularly produce more electricity than the grid can absorb, leading to curtailment, the deliberate reduction of clean generation that would otherwise go to waste. These periods represent an opportunity: if training is aligned with curtailment windows, LLMs can be pretrained using electricity that is both clean and cheap. This technical report presents a system that performs full-parameter LLM training across geo-distributed GPU clusters during regional curtailment windows, elastically switching between local single-site training and federated multi-site synchronization as sites become available or unavailable. Our prototype trains a 561M-parameter transformer model across three clusters using the Flower federated learning framework, with curtailment periods derived from real-world marginal carbon intensity traces. Preliminary results show that curtailment-aware scheduling preserves training quality while reducing operational emissions to 5-12% of single-site baselines.