CYMay 7
How Hyper-Datafication Impacts the Sustainability Costs in Frontier AISophia N. Wilson, Sebastian Mair, Mophat Okinyi et al.
Large-scale data has fuelled the success of frontier artificial intelligence (AI) models over the past decade. This expansion has relied on sustained efforts by large technology corporations to aggregate and curate internet-scale datasets. In this work, we examine the environmental, social, and economic costs of large-scale data in AI through a sustainability lens. We argue that the field is shifting from building models from data to actively creating data for building models. We characterise this transition as hyper-datafication, which marks a critical juncture for the future of frontier AI and its societal impacts. To quantify and contextualise data-related costs, we analyse approximately 550,000 datasets from the Hugging Face Hub, focusing on dataset growth, storage-related energy consumption and carbon footprint, and societal representation using language data. We complement this analysis with qualitative responses from data workers in Kenya to examine the labour involved, including direct employment by big tech corporations and exposure to graphic content. We further draw on external data sources to substantiate our findings by illustrating the global disparity in data centre infrastructure. Our analyses reveal that hyper-datafication does not merely increase resource consumption but systematically redistributes environmental burdens, labour risks, and representational harms toward the Global South, precarious data workers, and under-represented cultures. Thus, we propose Data PROOFS recommendations spanning provenance, resource awareness, ownership, openness, frugality, and standards to mitigate these costs. Our work aims to make visible the often-overlooked costs of data that underpin frontier AI and to stimulate broader debate within the research community and beyond.
CYMay 11
Mapping Data Labour Supply Chain in Africa in an Era of Digital Apartheid: a Struggle for RecognitionJessica Pidoux, Mariame Tighanimine, Sofia Kypraiou et al.
Content moderation and data annotation work has shifted to the Global South, particularly Africa, where workers at business process outsourcing (BPO) companies operate under precarity to serve Global North needs. We address the invisibility of this data labour supply chain and the underdocumented working conditions of its workforce. Drawing on a participatory collaboration between academics, an NGO, and a union, we conducted desk research and deployed a questionnaire (n=81) attuned to unions' organising goals. Our findings show that data labour spans 43 out of 55 African countries, involving 17 major firms serving predominantly North-American and European clients, with workers employed on short-term contracts, under psychological stress and economic instability - conditions that obscure the competences, i.e. adaptability and resilience, that their work demands. We contribute the first comprehensive map of Africa's data labour industry and demonstrate a methodology that centers workers' collective actions in documenting their conditions, drawing on Honneth's "struggle for recognition" to capture workers' demands for professional and social acknowledgement.
CYMay 12
Auditing African Content Moderators' Working Conditions by Using the European General Data Protection Regulation (GDPR)Mariame Tighanimine, Jessica Pidoux, Sonia Kgomo et al.
In this article, we audit the working conditions of content moderators in Kenya and Nigeria employed by business process outsourcing (BPO) companies by using the European General Data Protection Regulation (GDPR). We demonstrate its extraterritorial scope for gaining access to elements such as employment contracts and NDAs that have never been provided to the workers concerned. The results of this approach provide legally grounded evidence of the structural disadvantages faced by content moderators in the Global South, whose exploitative working conditions violate workers' rights. Our work also highlights the benefits of legislation aimed at protecting individuals' data rights as a counterweight to the tech industry's discourse of exceptionalism, which obscures its dependence on BPOs to externalise labour costs and accountability, whilst claiming that its products, business models, and methods of resource extraction are unprecedented and fall outside any existing legal framework.