Alessio Ferrari

SE
h-index38
15papers
282citations
Novelty28%
AI Score40

15 Papers

SEFeb 9, 2023
Zero-Shot Learning for Requirements Classification: An Exploratory Study

Waad Alhoshan, Alessio Ferrari, Liping Zhao

Context: Requirements engineering researchers have been experimenting with machine learning and deep learning approaches for a range of RE tasks, such as requirements classification, requirements tracing, ambiguity detection, and modelling. However, most of today's ML/DL approaches are based on supervised learning techniques, meaning that they need to be trained using a large amount of task-specific labelled training data. This constraint poses an enormous challenge to RE researchers, as the lack of labelled data makes it difficult for them to fully exploit the benefit of advanced ML/DL technologies. Objective: This paper addresses this problem by showing how a zero-shot learning approach can be used for requirements classification without using any labelled training data. We focus on the classification task because many RE tasks can be framed as classification problems. Method: The ZSL approach used in our study employs contextual word-embeddings and transformer-based language models. We demonstrate this approach through a series of experiments to perform three classification tasks: (1)FR/NFR: classification functional requirements vs non-functional requirements; (2)NFR: identification of NFR classes; (3)Security: classification of security vs non-security requirements. Results: The study shows that the ZSL approach achieves an F1 score of 0.66 for the FR/NFR task. For the NFR task, the approach yields F1~0.72-0.80, considering the most frequent classes. For the Security task, F1~0.66. All of the aforementioned F1 scores are achieved with zero-training efforts. Conclusion: This study demonstrates the potential of ZSL for requirements classification. An important implication is that it is possible to have very little or no training data to perform classification tasks. The proposed approach thus contributes to the solution of the long-standing problem of data shortage in RE.

CLApr 8, 2022
Classification of Natural Language Processing Techniques for Requirements Engineering

Liping Zhao, Waad Alhoshan, Alessio Ferrari et al.

Research in applying natural language processing (NLP) techniques to requirements engineering (RE) tasks spans more than 40 years, from initial efforts carried out in the 1980s to more recent attempts with machine learning (ML) and deep learning (DL) techniques. However, in spite of the progress, our recent survey shows that there is still a lack of systematic understanding and organization of commonly used NLP techniques in RE. We believe one hurdle facing the industry is lack of shared knowledge of NLP techniques and their usage in RE tasks. In this paper, we present our effort to synthesize and organize 57 most frequently used NLP techniques in RE. We classify these NLP techniques in two ways: first, by their NLP tasks in typical pipelines and second, by their linguist analysis levels. We believe these two ways of classification are complementary, contributing to a better understanding of the NLP techniques in RE and such understanding is crucial to the development of better NLP tools for RE.

19.7SEMar 10
Class Model Generation from Requirements using Large Language Models

Jackson Nguyen, Rui En Koe, Fanyu Wang et al.

The emergence of Large Language Models (LLMs) has opened new opportunities to automate software engineering activities that traditionally require substantial manual effort. Among these, class diagram generation represents a critical yet resource-intensive phase in software design. This paper investigates the capabilities of state-of-the-art LLMs, including GPT-5, Claude Sonnet 4.0, Gemini 2.5 Flash Thinking, and Llama-3.1-8B-Instruct, to generate UML class diagrams from natural language requirements automatically. To evaluate the effectiveness and reliability of LLM-based model generation, we propose a comprehensive dual-validation framework that integrates an LLM-as-a-Judge methodology with human-in-the-loop assessment. Using eight heterogeneous datasets, we apply chain-of-thought prompting to extract domain entities, attributes, and associations, generating corresponding PlantUML representations. The resulting models are evaluated across five quality dimensions: completeness, correctness, conformance to standards, comprehensibility, and terminological alignment. Two independent LLM judges (Grok and Mistral) perform structured pairwise comparisons, and their judgments are further validated against expert evaluations. Our results demonstrate that LLMs can generate structurally coherent and semantically meaningful UML diagrams, achieving substantial alignment with human evaluators. The consistency observed between LLM-based and human-based assessments highlights the potential of LLMs not only as modeling assistants but also as reliable evaluators in automated requirements engineering workflows, offering practical insights into the capabilities and limitations of LLM-driven UML class diagram automation.

SEFeb 6
Software Self-Extension with SelfEvolve: an Agentic Architecture for Runtime Code Generation

Md Asif Iqbal Fahim, Oluwadamilola Adebayo, Alessio Ferrari

Traditional self-adaptive systems automatically reconfigure existing components in response to changing requirements, but provide limited support for the generation of novel functionalities. The software generation capabilities of large language models (LLMs) open the possibility to create entirely new modules at runtime, enabling a form of self-evolution beyond traditional self-adaptation. We present SelfEvolve, an orchestrated agentic pipeline architecture enabling runtime self-extension--the autonomous addition of new capabilities during execution--as a preliminary form of self-evolution. Self-extension focuses on the autonomous generation and integration of new functions, based on user requests, without requiring a system restart or developer intervention. Evaluation of our architecture across 11 self-extension tasks demonstrates an average Pass@1 of 92.7% (51/55), outperforming developer-focused code generation baselines like AutoGen, MetaGPT, and AgentCoder. SelfEvolve achieves 61.8% improvement over the best baseline, i.e. Autogen, with statistical significance. This work demonstrates the feasibility of runtime capability extension through autonomous code generation. This provides preliminary evidence for a paradigm in which systems autonomously evolve to satisfy user needs, paving the way towards individualised, self-improving systems.

SEApr 9, 2024
Model Generation with LLMs: From Requirements to UML Sequence Diagrams

Alessio Ferrari, Sallam Abualhaija, Chetan Arora

Complementing natural language (NL) requirements with graphical models can improve stakeholders' communication and provide directions for system design. However, creating models from requirements involves manual effort. The advent of generative large language models (LLMs), ChatGPT being a notable example, offers promising avenues for automated assistance in model generation. This paper investigates the capability of ChatGPT to generate a specific type of model, i.e., UML sequence diagrams, from NL requirements. We conduct a qualitative study in which we examine the sequence diagrams generated by ChatGPT for 28 requirements documents of various types and from different domains. Observations from the analysis of the generated diagrams have systematically been captured through evaluation logs, and categorized through thematic analysis. Our results indicate that, although the models generally conform to the standard and exhibit a reasonable level of understandability, their completeness and correctness with respect to the specified requirements often present challenges. This issue is particularly pronounced in the presence of requirements smells, such as ambiguity and inconsistency. The insights derived from this study can influence the practical utilization of LLMs in the RE process, and open the door to novel RE-specific prompting strategies targeting effective model generation.

CLApr 23, 2025
How Effective are Generative Large Language Models in Performing Requirements Classification?

Waad Alhoshan, Alessio Ferrari, Liping Zhao

In recent years, transformer-based large language models (LLMs) have revolutionised natural language processing (NLP), with generative models opening new possibilities for tasks that require context-aware text generation. Requirements engineering (RE) has also seen a surge in the experimentation of LLMs for different tasks, including trace-link detection, regulatory compliance, and others. Requirements classification is a common task in RE. While non-generative LLMs like BERT have been successfully applied to this task, there has been limited exploration of generative LLMs. This gap raises an important question: how well can generative LLMs, which produce context-aware outputs, perform in requirements classification? In this study, we explore the effectiveness of three generative LLMs-Bloom, Gemma, and Llama-in performing both binary and multi-class requirements classification. We design an extensive experimental study involving over 400 experiments across three widely used datasets (PROMISE NFR, Functional-Quality, and SecReq). Our study concludes that while factors like prompt design and LLM architecture are universally important, others-such as dataset variations-have a more situational impact, depending on the complexity of the classification task. This insight can guide future model development and deployment strategies, focusing on optimising prompt structures and aligning model architectures with task-specific needs for improved performance.

SEOct 24, 2025
Does Model Size Matter? A Comparison of Small and Large Language Models for Requirements Classification

Mohammad Amin Zadenoori, Vincenzo De Martino, Jacek Dabrowski et al.

[Context and motivation] Large language models (LLMs) show notable results in natural language processing (NLP) tasks for requirements engineering (RE). However, their use is compromised by high computational cost, data sharing risks, and dependence on external services. In contrast, small language models (SLMs) offer a lightweight, locally deployable alternative. [Question/problem] It remains unclear how well SLMs perform compared to LLMs in RE tasks in terms of accuracy. [Results] Our preliminary study compares eight models, including three LLMs and five SLMs, on requirements classification tasks using the PROMISE, PROMISE Reclass, and SecReq datasets. Our results show that although LLMs achieve an average F1 score of 2% higher than SLMs, this difference is not statistically significant. SLMs almost reach LLMs performance across all datasets and even outperform them in recall on the PROMISE Reclass dataset, despite being up to 300 times smaller. We also found that dataset characteristics play a more significant role in performance than model size. [Contribution] Our study contributes with evidence that SLMs are a valid alternative to LLMs for requirements classification, offering advantages in privacy, cost, and local deployability.

AIMar 12, 2025
LLM-Guided Indoor Navigation with Multimodal Map Understanding

Alberto Coffrini, Paolo Barsocchi, Francesco Furfari et al.

Indoor navigation presents unique challenges due to complex layouts and the unavailability of GNSS signals. Existing solutions often struggle with contextual adaptation, and typically require dedicated hardware. In this work, we explore the potential of a Large Language Model (LLM), i.e., ChatGPT, to generate natural, context-aware navigation instructions from indoor map images. We design and evaluate test cases across different real-world environments, analyzing the effectiveness of LLMs in interpreting spatial layouts, handling user constraints, and planning efficient routes. Our findings demonstrate the potential of LLMs for supporting personalized indoor navigation, with an average of 86.59% correct indications and a maximum of 97.14%. The proposed system achieves high accuracy and reasoning performance. These results have key implications for AI-driven navigation and assistive technologies.

GEO-PHJan 29, 2025
A finite element-based machine learning model for hydro-mechanical analysis of swelling behavior in clay-sulfate rocks

Reza Taherdangkoo, Mostafa Mollaali, Matthias Ehrhardt et al.

The hydro-mechanical behavior of clay-sulfate rocks, especially their swelling properties, poses significant challenges in geotechnical engineering. This study presents a hybrid constrained machine learning (ML) model developed using the categorical boosting algorithm (CatBoost) tuned with a Bayesian optimization algorithm to predict and analyze the swelling behavior of these complex geological materials. Initially, a coupled hydro-mechanical model based on the Richards' equation coupled to a deformation process with linear kinematics implemented within the finite element framework OpenGeoSys was used to simulate the observed ground heave in Staufen, Germany, caused by water inflow into the clay-sulfate bearing Triassic Grabfeld Formation. A systematic parametric analysis using Gaussian distributions of key parameters, including Young's modulus, Poisson's ratio, maximum swelling pressure, permeability, and air entry pressure, was performed to construct a synthetic database. The ML model takes time, spatial coordinates, and these parameter values as inputs, while water saturation, porosity, and vertical displacement are outputs. In addition, penalty terms were incorporated into the CatBoost objective function to enforce physically meaningful predictions. Results show that the hybrid approach effectively captures the nonlinear and dynamic interactions that govern hydro-mechanical processes. The study demonstrates the ability of the model to predict the swelling behavior of clay-sulfate rocks, providing a robust tool for risk assessment and management in affected regions. The results highlight the potential of ML-driven models to address complex geotechnical challenges.

SEJul 12, 2021
Formal Methods in Railways: a Systematic Mapping Study

Alessio Ferrari, Maurice H. ter Beek

Formal methods are mathematically-based techniques for the rigorous development of software-intensive systems. The railway signaling domain is a field in which formal methods have traditionally been applied, with several success stories. This article reports on a mapping study that surveys the landscape of research on applications of formal methods to the development of railway systems. Our main results are as follows: (i) we identify a total of 328 primary studies relevant to our scope published between 1989 and 2020, of which 44% published during the last 5 years and 24% involving industry; (ii) the majority of studies are evaluated through Examples (41%) and Experience Reports (38%), while full-fledged Case Studies are limited (1.5%); (iii) Model checking is the most commonly adopted technique (47%), followed by simulation (27%) and theorem proving (19.5%); (iv) the dominant languages are UML (18%) and B (15%), while frequently used tools are ProB (9%), NuSMV (8%) and UPPAAL (7%); however, a diverse landscape of languages and tools is employed; (v) the majority of systems are interlocking products (40%), followed by models of high-level control logic (27%); (vi) most of the studies focus on the Architecture (66%) and Detailed Design (45%) development phases. Based on these findings, we highlight current research gaps and expected actions. In particular, the need to focus on more empirically sound research methods, such as Case Studies and Controlled Experiments, and to lower the degree of abstraction, by applying formal methods and tools to development phases that are closer to software development. Our study contributes with an empirically based perspective on the future of research and practice in formal methods applications for railways.

SEMay 6, 2021
Rethinking Sustainability Requirements: Drivers, Barriers and Impacts of Digitalisation from the Viewpoint of Experts

Alessio Ferrari, Manlio Bacco, Kirsten Moore et al.

Requirements engineering (RE) is a key area to address sustainability concerns in system development. Approaches have been proposed to elicit sustainability requirements from interested stakeholders before system design. However, existing strategies lack the proper high-level view to deal with the societal and long-term impacts of the transformation entailed by the introduction of a new technological solution. This paper proposes to go beyond the concept of system requirements and stakeholders' goals, and raise the degree of abstraction by focusing on the notions of drivers, barriers and impacts that a system can have on the environment in which it is deployed. Furthermore, we suggest to narrow the perspective to a single domain, as the effect of a technology is context-dependent. To put this vision into practice, we interview 30 cross-disciplinary experts in the representative domain of rural areas, and we analyse the transcripts to identify common themes. As a result, we provide drivers, barriers and positive or negative impacts associated to the introduction of novel technical solutions in rural areas. This RE-relevant information could hardly be identified if interested stakeholders were interviewed before the development of a single specific system. This paper contributes to the literature with a fresh perspective on sustainability requirements, and with a domain-specific framework grounded on experts' opinions. The conceptual framework resulting from our analysis can be used as a reference baseline for requirements elicitation endeavours in rural areas that need to account for sustainability concerns.

SEApr 6, 2021
Using Voice and Biofeedback to Predict User Engagement during Product Feedback Interviews

Alessio Ferrari, Thaide Huichapa, Paola Spoletini et al.

Capturing users' engagement is crucial for gathering feedback about the features of a software product. In a market-driven context, current approaches to collect and analyze users' feedback are based on techniques leveraging information extracted from product reviews and social media. These approaches are hardly applicable in bespoke software development, or in contexts in which one needs to gather information from specific users. In such cases, companies need to resort to face-to-face interviews to get feedback on their products. In this paper, we propose to utilize biometric data, in terms of physiological and voice features, to complement interviews with information about the engagement of the user on the discussed product-relevant topics. We evaluate our approach by interviewing users while gathering their physiological data (i.e., biofeedback) using an Empatica E4 wristband, and capturing their voice through the default audio-recorder of a common laptop. Our results show that we can predict users' engagement by training supervised machine learning algorithms on biometric data (F1=0.72), and that voice features alone are sufficiently effective (F1=0.71). Our work contributes with one the first studies in requirements engineering in which biometrics are used to identify emotions. This is also the first study in software engineering that considers voice analysis. The usage of voice features could be particularly helpful for emotion-aware requirements elicitation in remote communication, either performed by human analysts or voice-based chatbots, and can also be exploited to support the analysis of meetings in software engineering research.

SEJan 27, 2021
Systematic Evaluation and Usability Analysis of Formal Tools for Railway System Design

Alessio Ferrari, Franco Mazzanti, Davide Basile et al.

Formal methods and supporting tools have a long record of success in the development of safety-critical systems. However, no single tool has emerged as the dominant solution for system design. Each tool differs from the others in terms of the modeling language used, its verification capabilities and other complementary features, and each development context has peculiar needs that require different tools. This is particularly problematic for the railway industry, in which formal methods are highly recommended by the norms, but no actual guidance is provided for the selection of tools. To guide companies in the selection of the most appropriate formal tools to adopt in their contexts, a clear assessment of the features of the currently available tools is required. To address this goal, this paper considers a set of 13 formal tools that have been used for railway system design, and it presents a systematic evaluation of such tools and a preliminary usability analysis of a subset of 7 tools, involving railway practitioners. The results are discussed considering the most desired aspects by industry and earlier related studies. While the focus is on the railway domain, the overall methodology can be applied to similar contexts. Our study thus contributes with a systematic evaluation of formal tools and it shows that despite the poor graphical interfaces, usability and maturity of the tools are not major problems, as claimed by contributions from the literature. Instead, support for process integration is the most relevant obstacle for the adoption of most of the tools.

SEApr 2, 2020
Natural Language Processing (NLP) for Requirements Engineering: A Systematic Mapping Study

Liping Zhao, Waad Alhoshan, Alessio Ferrari et al.

Natural language processing supported requirements engineering is an area of research and development that seeks to apply NLP techniques, tools and resources to a variety of requirements documents or artifacts to support a range of linguistic analysis tasks performed at various RE phases. Such tasks include detecting language issues, identifying key domain concepts and establishing traceability links between requirements. This article surveys the landscape of NLP4RE research to understand the state of the art and identify open problems. The systematic mapping study approach is used to conduct this survey, which identified 404 relevant primary studies and reviewed them according to five research questions, cutting across five aspects of NLP4RE research, concerning the state of the literature, the state of empirical research, the research focus, the state of the practice, and the NLP technologies used. Results: 1) NLP4RE is an active and thriving research area in RE that has amassed a large number of publications and attracted widespread attention from diverse communities; 2) most NLP4RE studies are solution proposals having only been evaluated using a laboratory experiment or an example application; 3) most studies have focused on the analysis phase, with detection as their central linguistic analysis task and requirements specification as their commonly processed document type; 4) 130 new tools have been proposed to support a range of linguistic analysis tasks, but there is little evidence of adoption in the long term, although some industrial applications have been published; 5) 140 NLP techniques, 66 NLP tools and 25 NLP resources are extracted from the selected studies.

SEMar 27, 2018
Ten Diverse Formal Models for a CBTC Automatic Train Supervision System

Franco Mazzanti, Alessio Ferrari

Communications-based Train Control (CBTC) systems are metro signalling platforms, which coordinate and protect the movements of trains within the tracks of a station, and between different stations. In CBTC platforms, a prominent role is played by the Automatic Train Supervision (ATS) system, which automatically dispatches and routes trains within the metro network. Among the various functions, an ATS needs to avoid deadlock situations, i.e., cases in which a group of trains block each other. In the context of a technology transfer study, we designed an algorithm for deadlock avoidance in train scheduling. In this paper, we present a case study in which the algorithm has been applied. The case study has been encoded using ten different formal verification environments, namely UMC, SPIN, NuSMV/nuXmv, mCRL2, CPN Tools, FDR4, CADP, TLA+, UPPAAL and ProB. Based on our experience, we observe commonalities and differences among the modelling languages considered, and we highlight the impact of the specific characteristics of each language on the presented models.