LGDCDec 31, 2024

Towards Sustainable Large Language Model Serving

arXiv:2501.01990v129 citationsh-index: 4ACM SIGEnergy Energy Informatics Review
Originality Synthesis-oriented
AI Analysis

This addresses the environmental impact of AI systems for researchers and practitioners deploying LLMs, though it appears incremental as it applies existing modeling approaches to new data.

The paper tackles the problem of carbon emissions from large language model serving by characterizing performance and energy consumption across different model sizes and GPU types, and modeling both operational and embodied emissions based on grid regions and hardware specifications. The result provides insights for optimizing sustainable LLM serving systems by considering both emission types simultaneously.

In this work, we study LLMs from a carbon emission perspective, addressing both operational and embodied emissions, and paving the way for sustainable LLM serving. We characterize the performance and energy of LLaMA with 1B, 3B, and 7B parameters using two Nvidia GPU types, a latest-generation RTX6000 Ada and an older-generation T4. We analytically model operational carbon emissions based on energy consumption and carbon intensities from three grid regions -- each representing a different energy source mix, and embodied carbon emissions based on chip area and memory size. Our characterization and modeling provide us with an in-depth understanding of the performance, energy, and carbon emissions of LLM serving. Our findings highlight the potential for optimizing sustainable LLM serving systems by considering both operational and embodied carbon emissions simultaneously.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes