CYAIMay 27, 2025

Responsible Data Stewardship: Generative AI and the Digital Waste Problem

arXiv:2505.21720v1
Originality Synthesis-oriented
AI Analysis

It addresses the environmental consequences of indefinite synthetic data storage for the AI community, framing it as an incremental ethical extension.

This paper identifies digital waste from generative AI systems as an understudied sustainability problem, proposing recommendations to mitigate its environmental impact by expanding AI ethics to include intergenerational environmental justice.

As generative AI systems become widely adopted, they enable unprecedented creation levels of synthetic data across text, images, audio, and video modalities. While research has addressed the energy consumption of model training and inference, a critical sustainability challenge remains understudied: digital waste. This term refers to stored data that consumes resources without serving a specific (and/or immediate) purpose. This paper presents this terminology in the AI context and introduces digital waste as an ethical imperative within (generative) AI development, positioning environmental sustainability as core for responsible innovation. Drawing from established digital resource management approaches, we examine how other disciplines manage digital waste and identify transferable approaches for the AI community. We propose specific recommendations encompassing re-search directions, technical interventions, and cultural shifts to mitigate the environmental consequences of in-definite data storage. By expanding AI ethics beyond immediate concerns like bias and privacy to include inter-generational environmental justice, this work contributes to a more comprehensive ethical framework that considers the complete lifecycle impact of generative AI systems.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes