Chaima Boufaied

SE
h-index5
4papers
17citations
Novelty26%
AI Score36

4 Papers

SEJul 31, 2023
A Comprehensive Study of Machine Learning Techniques for Log-Based Anomaly Detection

Shan Ali, Chaima Boufaied, Domenico Bianculli et al.

Growth in system complexity increases the need for automated log analysis techniques, such as Log-based Anomaly Detection (LAD). While deep learning (DL) methods have been widely used for LAD, traditional machine learning (ML) techniques can also perform well depending on the context and dataset. Semi-supervised techniques deserve the same attention as they offer practical advantages over fully supervised methods. Current evaluations mainly focus on detection accuracy, but this alone is insufficient to determine the suitability of a technique for a given LAD task. Other aspects to consider include training and prediction times as well as the sensitivity to hyperparameter tuning, which in practice matters to engineers. This paper presents a comprehensive empirical study evaluating a wide range of supervised and semi-supervised, traditional and deep ML techniques across four criteria: detection accuracy, time performance, and sensitivity to hyperparameter tuning in both detection accuracy and time performance. The experimental results show that supervised traditional and deep ML techniques fare similarly in terms of their detection accuracy and prediction time on most of the benchmark datasets considered in our study. Moreover, overall, sensitivity analysis to hyperparameter tuning with respect to detection accuracy shows that supervised traditional ML techniques are less sensitive than deep learning techniques. Further, semi-supervised techniques yield significantly worse detection accuracy than supervised techniques.

SEDec 8, 2025
A Gray Literature Study on Fairness Requirements in AI-enabled Software Engineering

Thanh Nguyen, Chaima Boufaied, Ronnie de Souza Santos

Today, with the growing obsession with applying Artificial Intelligence (AI), particularly Machine Learning (ML), to software across various contexts, much of the focus has been on the effectiveness of AI models, often measured through common metrics such as F1- score, while fairness receives relatively little attention. This paper presents a review of existing gray literature, examining fairness requirements in AI context, with a focus on how they are defined across various application domains, managed throughout the Software Development Life Cycle (SDLC), and the causes, as well as the corresponding consequences of their violation by AI models. Our gray literature investigation shows various definitions of fairness requirements in AI systems, commonly emphasizing non-discrimination and equal treatment across different demographic and social attributes. Fairness requirement management practices vary across the SDLC, particularly in model training and bias mitigation, fairness monitoring and evaluation, and data handling practices. Fairness requirement violations are frequently linked, but not limited, to data representation bias, algorithmic and model design bias, human judgment, and evaluation and transparency gaps. The corresponding consequences include harm in a broad sense, encompassing specific professional and societal impacts as key examples, stereotype reinforcement, data and privacy risks, and loss of trust and legitimacy in AI-supported decisions. These findings emphasize the need for consistent frameworks and practices to integrate fairness into AI software, paying as much attention to fairness as to effectiveness.

12.1CLApr 22
Intersectional Fairness in Large Language Models

Chaima Boufaied, Ronnie De Souza Santos, Ann Barcomb

Large Language Models (LLMs) are increasingly deployed in socially sensitive settings, raising concerns about fairness and biases, particularly across intersectional demographic attributes. In this paper, we systematically evaluate intersectional fairness in six LLMs using ambiguous and disambiguated contexts from two benchmark datasets. We assess LLM behavior using bias scores, subgroup fairness metrics, accuracy, and consistency through multi-run analysis across contexts and negative and non-negative question polarities. Our results show that while modern LLMs generally perform well in ambiguous contexts, this limits the informativeness of fairness metrics due to sparse non-unknown predictions. In disambiguated contexts, LLM accuracy is influenced by stereotype alignment, with models being more accurate when the correct answer reinforces a stereotype than when it contradicts it. This pattern is especially pronounced in race-gender intersections, where directional bias toward stereotypes is stronger. Subgroup fairness metrics further indicate that, despite low observed disparity in some cases, outcome distributions remain uneven across intersectional groups. Across repeated runs, responses also vary in consistency, including stereotype-aligned responses. Overall, our findings show that apparent model competence is partly associated with stereotype-consistent cues, and no evaluated LLM achieves consistently reliable or fair behavior across intersectional settings. These findings highlight the need for evaluation beyond accuracy, emphasizing the importance of combining bias, subgroup fairness, and consistency metrics across intersectional groups, contexts, and repeated runs.

SPOct 18, 2019
Signal-Based Properties of Cyber-Physical Systems: Taxonomy and Logic-based Characterization

Chaima Boufaied, Maris Jukss, Domenico Bianculli et al.

The behavior of a cyber-physical system (CPS) is usually defined in terms of the input and output signals processed by sensors and actuators. Requirements specifications of CPSs are typically expressed using signal-based temporal properties. Expressing such requirements is challenging, because of (1) the many features that can be used to characterize a signal behavior; (2) the broad variation in expressiveness of the specification languages (i.e., temporal logics) used for defining signal-based temporal properties. Thus, system and software engineers need effective guidance on selecting appropriate signal behavior types and an adequate specification language, based on the type of requirements they have to define. In this paper, we present a taxonomy of the various types of signal-based properties and provide, for each type, a comprehensive and detailed description as well as a formalization in a temporal logic. Furthermore, we review the expressiveness of state-of-the-art signal-based temporal logics in terms of the property types identified in the taxonomy. Moreover, we report on the application of our taxonomy to classify the requirements specifications of an industrial case study in the aerospace domain, in order to assess the feasibility of using the property types included in our taxonomy and the completeness of the latter.