DCApr 25
A Taxonomy and Resolution Strategy for Client-Level Disagreements in Federated LearningDaan Rosendal, Ana Oprescu
Federated Learning (FL) typically assumes unconditional collaboration, a premise that overlooks the complexities of real-world, multi-stakeholder environments in which clients may need to exclude one another for strategic, regulatory, or competitive reasons. This paper addresses this gap, which we term 'client-level disagreements,' by first introducing a taxonomy of such scenarios. We then propose a robust, multi-track resolution strategy that guarantees strict client exclusion by creating and managing isolated model update paths ('tracks'), thereby preventing the cross-contamination and unfairness issues present in naive strategies. Through an empirical evaluation of our custom simulation system across 34 scenarios using the MNIST and N-CMAPSS datasets, we validate that our approach correctly handles permanent, temporal, and overlapping disagreement patterns. Our scalability analysis reveals the server-side resolution algorithm's overhead is negligible (<1 ms per round) even under heavy load. The primary scalability constraint is the client-side training load from participating in multiple tracks, a cost that we show can be effectively mitigated by a submodel reuse strategy. This work presents a scalable and architecturally sound method for managing client-level disagreements, and enhances the practical applicability of FL in settings where policy compliance and strategic control are non-negotiable.
CRSep 30, 2024
An interdisciplinary exploration of trade-offs between energy, privacy and accuracy aspects of dataPepijn de Reus, Kyra Dresen, Ana Oprescu et al.
The digital era has raised many societal challenges, including ICT's rising energy consumption and protecting privacy of personal data processing. This paper considers both aspects in relation to machine learning accuracy in an interdisciplinary exploration. We first present a method to measure the effects of privacy-enhancing techniques on data utility and energy consumption. The environmental-privacy-accuracy trade-offs are discovered through an experimental set-up. We subsequently take a storytelling approach to translate these technical findings to experts in non-ICT fields. We draft two examples for a governmental and auditing setting to contextualise our results. Ultimately, users face the task of optimising their data processing operations in a trade-off between energy, privacy, and accuracy considerations where the impact of their decisions is context-sensitive.
SENov 15, 2024
Generating Energy-efficient code with LLMsTom Cappendijk, Pepijn de Reus, Ana Oprescu
The increasing electricity demands of personal computers, communication networks, and data centers contribute to higher atmospheric greenhouse gas emissions, which in turn lead to global warming and climate change. Therefore the energy consumption of code must be minimized. Code can be generated by large language models. We look at the influence of prompt modification on the energy consumption of the code generated. We use three different Python code problems of varying difficulty levels. Prompt modification is done by adding the sentence ``Give me an energy-optimized solution for this problem'' or by using two Python coding best practices. The large language models used are CodeLlama-70b, CodeLlama-70b-Instruct, CodeLlama-70b-Python, DeepSeek-Coder-33b-base, and DeepSeek-Coder-33b-instruct. We find a decrease in energy consumption for a specific combination of prompt optimization, LLM, and Python code problem. However, no single optimization prompt consistently decreases energy consumption for the same LLM across the different Python code problems.
SEJun 2, 2025
Greening AI-enabled Systems with Software Engineering: A Research Agenda for Environmentally Sustainable AI PracticesLuís Cruz, João Paulo Fernandes, Maja H. Kirkeby et al.
The environmental impact of Artificial Intelligence (AI)-enabled systems is increasing rapidly, and software engineering plays a critical role in developing sustainable solutions. The "Greening AI with Software Engineering" CECAM-Lorentz workshop (no. 1358, 2025) funded by the Centre Européen de Calcul Atomique et Moléculaire and the Lorentz Center, provided an interdisciplinary forum for 29 participants, from practitioners to academics, to share knowledge, ideas, practices, and current results dedicated to advancing green software and AI research. The workshop was held February 3-7, 2025, in Lausanne, Switzerland. Through keynotes, flash talks, and collaborative discussions, participants identified and prioritized key challenges for the field. These included energy assessment and standardization, benchmarking practices, sustainability-aware architectures, runtime adaptation, empirical methodologies, and education. This report presents a research agenda emerging from the workshop, outlining open research directions and practical recommendations to guide the development of environmentally sustainable AI-enabled systems rooted in software engineering principles.
CLNov 15, 2024
An exploration of the effect of quantisation on energy consumption and inference time of StarCoder2Pepijn de Reus, Ana Oprescu, Jelle Zuidema
This study examines quantisation and pruning strategies to reduce energy consumption in code Large Language Models (LLMs) inference. Using StarCoder2, we observe increased energy demands with quantization due to lower throughput and some accuracy losses. Conversely, pruning reduces energy usage but impairs performance. The results highlight challenges and trade-offs in LLM model compression. We suggest future work on hardware-optimized quantization to enhance efficiency with minimal loss in accuracy.
LGMay 11, 2023
Energy cost and machine learning accuracy impact of k-anonymisation and synthetic data techniquesPepijn de Reus, Ana Oprescu, Koen van Elsen
To address increasing societal concerns regarding privacy and climate, the EU adopted the General Data Protection Regulation (GDPR) and committed to the Green Deal. Considerable research studied the energy efficiency of software and the accuracy of machine learning models trained on anonymised data sets. Recent work began exploring the impact of privacy-enhancing techniques (PET) on both the energy consumption and accuracy of the machine learning models, focusing on k-anonymity. As synthetic data is becoming an increasingly popular PET, this paper analyses the energy consumption and accuracy of two phases: a) applying privacy-enhancing techniques to the concerned data set, b) training the models on the concerned privacy-enhanced data set. We use two privacy-enhancing techniques: k-anonymisation (using generalisation and suppression) and synthetic data, and three machine-learning models. Each model is trained on each privacy-enhanced data set. Our results show that models trained on k-anonymised data consume less energy than models trained on the original data, with a similar performance regarding accuracy. Models trained on synthetic data have a similar energy consumption and a similar to lower accuracy compared to models trained on the original data.