SEJul 15, 2022
Modeling Quality and Machine Learning Pipelines through Extended Feature ModelsGiordano d'Aloisio, Antinisca Di Marco, Giovanni Stilo
The recently increased complexity of Machine Learning (ML) methods, led to the necessity to lighten both the research and industry development processes. ML pipelines have become an essential tool for experts of many domains, data scientists and researchers, allowing them to easily put together several ML models to cover the full analytic process starting from raw datasets. Over the years, several solutions have been proposed to automate the building of ML pipelines, most of them focused on semantic aspects and characteristics of the input dataset. However, an approach taking into account the new quality concerns needed by ML systems (like fairness, interpretability, privacy, etc.) is still missing. In this paper, we first identify, from the literature, key quality attributes of ML systems. Further, we propose a new engineering approach for quality ML pipeline by properly extending the Feature Models meta-model. The presented approach allows to model ML pipelines, their quality requirements (on the whole pipeline and on single phases), and quality characteristics of algorithms used to implement each pipeline phase. Finally, we demonstrate the expressiveness of our model considering the classification problem.
LGSep 20, 2023
Towards a Prediction of Machine Learning Training Time to Support Continuous Learning Systems DevelopmentFrancesca Marzi, Giordano d'Aloisio, Antinisca Di Marco et al.
The problem of predicting the training time of machine learning (ML) models has become extremely relevant in the scientific community. Being able to predict a priori the training time of an ML model would enable the automatic selection of the best model both in terms of energy efficiency and in terms of performance in the context of, for instance, MLOps architectures. In this paper, we present the work we are conducting towards this direction. In particular, we present an extensive empirical study of the Full Parameter Time Complexity (FPTC) approach by Zheng et al., which is, to the best of our knowledge, the only approach formalizing the training time of ML models as a function of both dataset's and model's parameters. We study the formulations proposed for the Logistic Regression and Random Forest classifiers, and we highlight the main strengths and weaknesses of the approach. Finally, we observe how, from the conducted study, the prediction of training time is strictly related to the context (i.e., the involved dataset) and how the FPTC approach is not generalizable.
21.5LGMar 22
A Generalised Exponentiated Gradient Approach to Enhance Fairness in Binary and Multi-class Classification TasksMaryam Boubekraoui, Giordano d'Aloisio, Antinisca Di Marco
The widespread use of AI and ML models in sensitive areas raises significant concerns about fairness. While the research community has introduced various methods for bias mitigation in binary classification tasks, the issue remains under-explored in multi-class classification settings. To address this limitation, in this paper, we first formulate the problem of fair learning in multi-class classification as a multi-objective problem between effectiveness (i.e., prediction correctness) and multiple linear fairness constraints. Next, we propose a Generalised Exponentiated Gradient (GEG) algorithm to solve this task. GEG is an in-processing algorithm that enhances fairness in binary and multi-class classification settings under multiple fairness definitions. We conduct an extensive empirical evaluation of GEG against six baselines across seven multi-class and three binary datasets, using four widely adopted effectiveness metrics and three fairness definitions. GEG overcomes existing baselines, with fairness improvements up to 92% and a decrease in accuracy up to 14%.
SEJan 15, 2025Code
How Do Generative Models Draw a Software Engineer? A Case Study on Stable Diffusion BiasTosin Fadahunsi, Giordano d'Aloisio, Antinisca Di Marco et al.
Generative models are nowadays widely used to generate graphical content used for multiple purposes, e.g. web, art, advertisement. However, it has been shown that the images generated by these models could reinforce societal biases already existing in specific contexts. In this paper, we focus on understanding if this is the case when one generates images related to various software engineering tasks. In fact, the Software Engineering (SE) community is not immune from gender and ethnicity disparities, which could be amplified by the use of these models. Hence, if used without consciousness, artificially generated images could reinforce these biases in the SE domain. Specifically, we perform an extensive empirical evaluation of the gender and ethnicity bias exposed by three versions of the Stable Diffusion (SD) model (a very popular open-source text-to-image model) - SD 2, SD XL, and SD 3 - towards SE tasks. We obtain 6,720 images by feeding each model with two sets of prompts describing different software-related tasks: one set includes the Software Engineer keyword, and one set does not include any specification of the person performing the task. Next, we evaluate the gender and ethnicity disparities in the generated images. Results show how all models are significantly biased towards male figures when representing software engineers. On the contrary, while SD 2 and SD XL are strongly biased towards White figures, SD 3 is slightly more biased towards Asian figures. Nevertheless, all models significantly under-represent Black and Arab figures, regardless of the prompt style used. The results of our analysis highlight severe concerns about adopting those models to generate content for SE tasks and open the field for future research on bias mitigation in this context.
66.8SEMay 8
SafeTune: Search-based Harmfulness Minimisation for Large Language ModelsGiordano d'Aloisio, David Williams, Giusy Annunziata et al.
The widespread adoption of Large Language Models (LLMs) raises concerns about the potential harmfulness of their responses. In this paper, we first investigate the harmfulness of responses from four general-purpose LLMs. Next, we propose SafeTune, a multi-objective search-based approach to mitigate harmfulness while increasing response relevance through hyperparameter tuning and system prompt engineering. Our initial evaluation shows that SafeTune significantly reduces the rate of harmful responses generated by Qwen3.5 0.8B and increases prompt-response relevance (both with a large effect size). Among the parameters we explore, we also find that encouraging greater repetition in responses is most impactful in reducing harmfulness while increasing relevance.
SEDec 18, 2024
On the Compression of Language Models for Code: An Empirical Study on CodeBERTGiordano d'Aloisio, Luca Traini, Federica Sarro et al.
Language models have proven successful across a wide range of software engineering tasks, but their significant computational costs often hinder their practical adoption. To address this challenge, researchers have begun applying various compression strategies to improve the efficiency of language models for code. These strategies aim to optimize inference latency and memory usage, though often at the cost of reduced model effectiveness. However, there is still a significant gap in understanding how these strategies influence the efficiency and effectiveness of language models for code. Here, we empirically investigate the impact of three well-known compression strategies -- knowledge distillation, quantization, and pruning -- across three different classes of software engineering tasks: vulnerability detection, code summarization, and code search. Our findings reveal that the impact of these strategies varies greatly depending on the task and the specific compression method employed. Practitioners and researchers can use these insights to make informed decisions when selecting the most appropriate compression strategy, balancing both efficiency and effectiveness based on their specific needs.
CYOct 21, 2025
REPAIR Approach for Social-based City Reconstruction Planning in case of natural disastersGhulam Mudassir, Antinisca Di Marco, Giordano d'Aloisio
Natural disasters always have several effects on human lives. It is challenging for governments to tackle these incidents and to rebuild the economic, social and physical infrastructures and facilities with the available resources (mainly budget and time). Governments always define plans and policies according to the law and political strategies that should maximise social benefits. The severity of damage and the vast resources needed to bring life back to normality make such reconstruction a challenge. This article is the extension of our previously published work by conducting comprehensive comparative analysis by integrating additional deep learning models plus random agent which is used as a baseline. Our prior research introduced a decision support system by using the Deep Reinforcement Learning technique for the planning of post-disaster city reconstruction, maximizing the social benefit of the reconstruction process, considering available resources, meeting the needs of the broad community stakeholders (like citizens' social benefits and politicians' priorities) and keeping in consideration city's structural constraints (like dependencies among roads and buildings). The proposed approach, named post disaster REbuilding plAn ProvIdeR (REPAIR) is generic. It can determine a set of alternative plans for local administrators who select the ideal one to implement, and it can be applied to areas of any extension. We show the application of REPAIR in a real use case, i.e., to the L'Aquila reconstruction process, damaged in 2009 by a major earthquake.
SESep 21, 2021
Architecture Design for Human-Driven SystemsMahyar T. Moghaddam, Moamin B. Abughazala, Vittorio Cortellessa et al.
This paper highlights humans' social and mobility behaviors' role in the continuous engineering of sustainable socio-technical systems. Our approach relates the humans' characteristics and intentions with the system's goals, and models such interaction. Such a modeling approach aligns the architectural design and associated quality of service (QoS) with humans' quality of experience (QoE). We design a simulation environment that combines agent-based social simulation (ABSS) with architectural models generated through a model-driven engineering approach. Our modeling approach facilitates choosing the best architectural model and system configuration to enhance both the humans' and system's sustainability. We apply our approach to the Uffizi Galleries crowd management system. Taking advantage of real data, we model different scenarios that impact QoE. We then assess various architectural models with different SW/HW configurations to propose the optimal model based on different scenarios concerning QoS-QoE requirements.
SEApr 3, 2014
A model-driven approach to broaden the detection of software performance antipatterns at runtimeAntinisca Di Marco, Catia Trubiani
Performance antipatterns document bad design patterns that have negative influence on system performance. In our previous work we formalized such antipatterns as logical predicates that predicate on four views: (i) the static view that captures the software elements (e.g. classes, components) and the static relationships among them; (ii) the dynamic view that represents the interaction (e.g. messages) that occurs between the software entities elements to provide the system functionalities; (iii) the deployment view that describes the hardware elements (e.g. processing nodes) and the mapping of the software entities onto the hardware platform; (iv) the performance view that collects specific performance indices. In this paper we present a lightweight infrastructure that is able to detect performance antipatterns at runtime through monitoring. The proposed approach precalculates such predicates and identifies antipatterns whose static, dynamic and deployment sub-predicates are validated by the current system configuration and brings at runtime the verification of performance sub-predicates. The proposed infrastructure leverages model-driven techniques to generate probes for monitoring the performance sub-predicates and detecting antipatterns at runtime.