Giovanni Quattrocchi

SE
h-index23
4papers
16citations
Novelty55%
AI Score42

4 Papers

26.3SEMay 8
"Show Me You Comply... Without Showing Me Anything": Zero-Knowledge Software Auditing for AI-Enabled Systems

Filippo Scaramuzza, Renato Cordeiro Ferreira, Giovanni Quattrocchi et al.

Classical software verification and validation techniques, such as procedural audits, formal methods, or model documentation, are the traditional mechanisms used to achieve the verifiable accountability now required by regulations like the EU AI Act. These methods are either expensive or heavily manual, and ill-suited for the opaque, "black box" nature of most Artificial Intelligence (AI) models. A conflict arises: high auditability and verifiability are required by law, but such transparency conflicts with the need to protect the assets being audited (e.g., confidential data and proprietary models). This paper introduces ZKMLOps, an \ac{MLOps} verification framework that operationalizes Zero-Knowledge Proofs (ZKPs) within Machine-Learning Operations lifecycles; a ZKP allows a prover to convince a verifier that a statement is true without revealing any information about the statement itself. By integrating ZKP with established software engineering patterns, ZKMLOps provides a modular and repeatable process for generating verifiable cryptographic evidence-proofs of well-defined computational statements about the audited model and its inputs-that auditors can use as input to a regulatory compliance determination. We evaluate the framework along two dimensions. First, framework viability: orchestration overhead is bounded and stable across architecturally heterogeneous ZKP backends and models of increasing size. Second, cost-versus-assurance trade-offs: the audit-on-demand setting is the regime in which full zero-knowledge auditing is the appropriate tool, where it provides confidentiality and integrity guarantees that lighter-weight alternatives cannot match.

SEJul 20, 2025
Can LLMs Generate User Stories and Assess Their Quality?

Giovanni Quattrocchi, Liliana Pasquale, Paola Spoletini et al.

Requirements elicitation is still one of the most challenging activities of the requirements engineering process due to the difficulty requirements analysts face in understanding and translating complex needs into concrete requirements. In addition, specifying high-quality requirements is crucial, as it can directly impact the quality of the software to be developed. Although automated tools allow for assessing the syntactic quality of requirements, evaluating semantic metrics (e.g., language clarity, internal consistency) remains a manual and time-consuming activity. This paper explores how LLMs can help automate requirements elicitation within agile frameworks, where requirements are defined as user stories (US). We used 10 state-of-the-art LLMs to investigate their ability to generate US automatically by emulating customer interviews. We evaluated the quality of US generated by LLMs, comparing it with the quality of US generated by humans (domain experts and students). We also explored whether and how LLMs can be used to automatically evaluate the semantic quality of US. Our results indicate that LLMs can generate US similar to humans in terms of coverage and stylistic quality, but exhibit lower diversity and creativity. Although LLM-generated US are generally comparable in quality to those created by humans, they tend to meet the acceptance quality criteria less frequently, regardless of the scale of the LLM model. Finally, LLMs can reliably assess the semantic quality of US when provided with clear evaluation criteria and have the potential to reduce human effort in large-scale assessments.

CVFeb 5, 2025
DILLEMA: Diffusion and Large Language Models for Multi-Modal Augmentation

Luciano Baresi, Davide Yi Xian Hu, Muhammad Irfan Mas'udi et al.

Ensuring the robustness of deep learning models requires comprehensive and diverse testing. Existing approaches, often based on simple data augmentation techniques or generative adversarial networks, are limited in producing realistic and varied test cases. To address these limitations, we present a novel framework for testing vision neural networks that leverages Large Language Models and control-conditioned Diffusion Models to generate synthetic, high-fidelity test cases. Our approach begins by translating images into detailed textual descriptions using a captioning model, allowing the language model to identify modifiable aspects of the image and generate counterfactual descriptions. These descriptions are then used to produce new test images through a text-to-image diffusion process that preserves spatial consistency and maintains the critical elements of the scene. We demonstrate the effectiveness of our method using two datasets: ImageNet1K for image classification and SHIFT for semantic segmentation in autonomous driving. The results show that our approach can generate significant test cases that reveal weaknesses and improve the robustness of the model through targeted retraining. We conducted a human assessment using Mechanical Turk to validate the generated images. The responses from the participants confirmed, with high agreement among the voters, that our approach produces valid and realistic images.

SEMar 25, 2020
Quality Assurance of Heterogeneous Applications: The SODALITE Approach

Indika Kumara, Giovanni Quattrocchi, Damian Tamburri et al.

A key focus of the SODALITE project is to assure the quality and performance of the deployments of applications over heterogeneous Cloud and HPC environments. It offers a set of tools to detect and correct errors, smells, and bugs in the deployment models and their provisioning workflows, and a framework to monitor and refactor deployment model instances at runtime. This paper presents objectives, designs, early results of the quality assurance framework and the refactoring framework.