AIOct 17, 2022
A.I. Robustness: a Human-Centered Perspective on Technological Challenges and OpportunitiesAndrea Tocchetti, Lorenzo Corti, Agathe Balayn et al.
Despite the impressive performance of Artificial Intelligence (AI) systems, their robustness remains elusive and constitutes a key issue that impedes large-scale adoption. Robustness has been studied in many domains of AI, yet with different interpretations across domains and contexts. In this work, we systematically survey the recent progress to provide a reconciled terminology of concepts around AI robustness. We introduce three taxonomies to organize and describe the literature both from a fundamental and applied point of view: 1) robustness by methods and approaches in different phases of the machine learning pipeline; 2) robustness for specific model architectures, tasks, and systems; and in addition, 3) robustness assessment methodologies and insights, particularly the trade-offs with other trustworthiness properties. Finally, we identify and discuss research gaps and opportunities and give an outlook on the field. We highlight the central role of humans in evaluating and enhancing AI robustness, considering the necessary knowledge humans can provide, and discuss the need for better understanding practices and developing supportive tools in the future.
CLMar 11Code
MDER-DR: Multi-Hop Question Answering with Entity-Centric SummariesRiccardo Campi, Nicolò Oreste Pinciroli Vago, Mathyas Giudici et al.
Retrieval-Augmented Generation (RAG) over Knowledge Graphs (KGs) suffers from the fact that indexing approaches may lose important contextual nuance when text is reduced to triples, thereby degrading performance in downstream Question-Answering (QA) tasks, particularly for multi-hop QA, which requires composing answers from multiple entities, facts, or relations. We propose a domain-agnostic, KG-based QA framework that covers both the indexing and retrieval/inference phases. A new indexing approach called Map-Disambiguate-Enrich-Reduce (MDER) generates context-derived triple descriptions and subsequently integrates them with entity-level summaries, thus avoiding the need for explicit traversal of edges in the graph during the QA retrieval phase. Complementing this, we introduce Decompose-Resolve (DR), a retrieval mechanism that decomposes user queries into resolvable triples and grounds them in the KG via iterative reasoning. Together, MDER and DR form an LLM-driven QA pipeline that is robust to sparse, incomplete, and complex relational data. Experiments show that on standard and domain specific benchmarks, MDER-DR achieves substantial improvements over standard RAG baselines (up to 66%), while maintaining cross-lingual robustness. Our code is available at https://github.com/DataSciencePolimi/MDER-DR_RAG.
CVMay 19Code
A Framework for Evaluating Zero-Shot Image Generation in Concept-based ExplainabilityGiacomo Astolfi, Matteo Bianchi, Riccardo Campi et al.
Concept-based Explainable Artificial Intelligence (XAI) interprets deep learning models using human-understandable visual features (e.g., textures or object parts) by linking internal representations to class predictions, thereby bridging the gap between low-level image data and high-level semantics. A major challenge, however, is the reliance on large sets of labeled images to represent each concept, which limits scalability. In this work, we investigate the use of zero-shot Text-to-Image (T2I) generative models as a source of synthetic concept datasets for concept-based XAI methods. Specifically, we generate concepts using predefined prompts and evaluate their faithfulness to real ones through four complementary analyses: (1) comparing synthetic vs. real concept images via concept representation similarity; (2) evaluating their intra-similarity by comparing pairs of subsets of the same concept with progressively increasing size; (3) evaluating their performance for downstream explanation tasks using relevant class images; (4) evaluating how removing a concept from tested class images affects explanations of generated concepts. While current T2I generative models promise a shortcut to concept-based XAI, our study highlights challenges and raises open questions about the use of synthetic data generated by zero-shot pipelines in model analyses. The resulting dataset is available at https://github.com/DataSciencePolimi/ZeroShot-T2I-Concepts.
CLApr 18Code
Comparing Human and Large Language Model Interpretation of Implicit InformationAntonio De Santis, Tommaso Bonetti, Andrea Tocchetti et al.
The interpretation of implicit meanings is an integral aspect of human communication. However, this framework may not transfer to interactions with Large Language Models (LLMs). To investigate this, we introduce the task of Implicit Information Extraction (IIE) and propose an LLM-based IIE pipeline that builds a structured knowledge graph from a context sentence by extracting relational triplets, validating implicit inferences, and analyzing temporal relations. We evaluate two LLMs against crowdsourced human judgments on two datasets. We find that humans agree with most model triplets yet consistently propose many additions, indicating limited coverage in current LLM-based IIE. Moreover, in our experiments, models appear to be more conservative about implicit inferences than humans in socially rich contexts, whereas humans become more conservative in shorter, fact-oriented contexts. Our code is available at https://github.com/Antonio-Dee/IIE_from_LLM.
AIAug 3, 2024
Integrating Large Language Models and Knowledge Graphs for Extraction and Validation of Textual Test DataAntonio De Santis, Marco Balduini, Federico De Santis et al.
Aerospace manufacturing companies, such as Thales Alenia Space, design, develop, integrate, verify, and validate products characterized by high complexity and low volume. They carefully document all phases for each product but analyses across products are challenging due to the heterogeneity and unstructured nature of the data in documents. In this paper, we propose a hybrid methodology that leverages Knowledge Graphs (KGs) in conjunction with Large Language Models (LLMs) to extract and validate data contained in these documents. We consider a case study focused on test data related to electronic boards for satellites. To do so, we extend the Semantic Sensor Network ontology. We store the metadata of the reports in a KG, while the actual test results are stored in parquet accessible via a Virtual Knowledge Graph. The validation process is managed using an LLM-based approach. We also conduct a benchmarking study to evaluate the performance of state-of-the-art LLMs in executing this task. Finally, we analyze the costs and benefits of automating preexisting processes of manual data extraction and validation for subsequent cross-report analyses.
IRMar 20
LLM-Enhanced Semantic Data Integration of Electronic Component Qualifications in the Aerospace DomainAntonio De Santis, Marco Balduini, Matteo Belcao et al.
Large manufacturing companies face challenges in information retrieval due to data silos maintained by different departments, leading to inconsistencies and misalignment across databases. This paper presents an experience in integrating and retrieving qualification data for electronic components used in satellite board design. Due to data silos, designers cannot immediately determine the qualification status of individual components. However, this process is critical during the planning phase, when assembly drawings are issued before production, to optimize new qualifications and avoid redundant efforts. To address this, we propose a pipeline that uses Virtual Knowledge Graphs for a unified view over heterogeneous data sources and LLMs to enhance retrieval and reduce manual effort in data cleansing. The retrieval of qualifications is then performed through an Ontology-based Data Access approach for structured queries and a vector search mechanism for retrieving qualifications based on similar textual properties. We perform a comparative cost-benefit analysis, demonstrating that the proposed pipeline also outperforms approaches relying solely on LLMs, such as Retrieval-Augmented Generation (RAG), in terms of long-term efficiency.
CVNov 8, 2024Code
Visual-TCAV: Concept-based Attribution and Saliency Maps for Post-hoc Explainability in Image ClassificationAntonio De Santis, Riccardo Campi, Matteo Bianchi et al.
Convolutional Neural Networks (CNNs) have seen significant performance improvements in recent years. However, due to their size and complexity, they function as black-boxes, leading to transparency concerns. State-of-the-art saliency methods generate local explanations that highlight the area in the input image where a class is identified but cannot explain how a concept of interest contributes to the prediction, which is essential for bias mitigation. On the other hand, concept-based methods, such as TCAV (Testing with Concept Activation Vectors), provide insights into how sensitive is the network to a concept, but cannot compute its attribution in a specific prediction nor show its location within the input image. This paper introduces a novel post-hoc explainability framework, Visual-TCAV, which aims to bridge the gap between these methods by providing both local and global explanations for CNN-based image classification. Visual-TCAV uses Concept Activation Vectors (CAVs) to generate saliency maps that show where concepts are recognized by the network. Moreover, it can estimate the attribution of these concepts to the output of any class using a generalization of Integrated Gradients. This framework is evaluated on popular CNN architectures, with its validity further confirmed via experiments where ground truth for explanations is known, and a comparison with TCAV. Our code is available at https://github.com/DataSciencePolimi/Visual-TCAV.
LGMar 7Code
Learning Concept Bottleneck Models from Mechanistic ExplanationsAntonio De Santis, Schrasing Tong, Marco Brambilla et al.
Concept Bottleneck Models (CBMs) aim for ante-hoc interpretability by learning a bottleneck layer that predicts interpretable concepts before the decision. State-of-the-art approaches typically select which concepts to learn via human specification, open knowledge graphs, prompting an LLM, or using general CLIP concepts. However, concepts defined a-priori may not have sufficient predictive power for the task or even be learnable from the available data. As a result, these CBMs often significantly trail their black-box counterpart when controlling for information leakage. To address this, we introduce a novel CBM pipeline named Mechanistic CBM (M-CBM), which builds the bottleneck directly from a black-box model's own learned concepts. These concepts are extracted via Sparse Autoencoders (SAEs) and subsequently named and annotated on a selected subset of images using a Multimodal LLM. For fair comparison and leakage control, we also introduce the Number of Contributing Concepts (NCC), a decision-level sparsity metric that extends the recently proposed NEC metric. Across diverse datasets, we show that M-CBMs consistently surpass prior CBMs at matched sparsity, while improving concept predictions and providing concise explanations. Our code is available at https://github.com/Antonio-Dee/M-CBM.
CLNov 3, 2025
A Graph-based RAG for Energy Efficiency Question AnsweringRiccardo Campi, Nicolò Oreste Pinciroli Vago, Mathyas Giudici et al.
In this work, we investigate the use of Large Language Models (LLMs) within a graph-based Retrieval Augmented Generation (RAG) architecture for Energy Efficiency (EE) Question Answering. First, the system automatically extracts a Knowledge Graph (KG) from guidance and regulatory documents in the energy field. Then, the generated graph is navigated and reasoned upon to provide users with accurate answers in multiple languages. We implement a human-based validation using the RAGAs framework properties, a validation dataset comprising 101 question-answer pairs, and domain experts. Results confirm the potential of this architecture and identify its strengths and weaknesses. Validation results show how the system correctly answers in about three out of four of the cases (75.2 +- 2.7%), with higher results on questions related to more general EE answers (up to 81.0 +- 4.1%), and featuring promising multilingual abilities (4.4% accuracy loss due to translation).
LGMay 6, 2024
Interpretable Network Visualizations: A Human-in-the-Loop Approach for Post-hoc Explainability of CNN-based Image ClassificationMatteo Bianchi, Antonio De Santis, Andrea Tocchetti et al.
Transparency and explainability in image classification are essential for establishing trust in machine learning models and detecting biases and errors. State-of-the-art explainability methods generate saliency maps to show where a specific class is identified, without providing a detailed explanation of the model's decision process. Striving to address such a need, we introduce a post-hoc method that explains the entire feature extraction process of a Convolutional Neural Network. These explanations include a layer-wise representation of the features the model extracts from the input. Such features are represented as saliency maps generated by clustering and merging similar feature maps, to which we associate a weight derived by generalizing Grad-CAM for the proposed methodology. To further enhance these explanations, we include a set of textual labels collected through a gamified crowdsourcing activity and processed using NLP techniques and Sentence-BERT. Finally, we show an approach to generate global explanations by aggregating labels across multiple images.
CLOct 5, 2021
Exploiting Twitter as Source of Large Corpora of Weakly Similar Pairs for Semantic Sentence EmbeddingsMarco Di Giovanni, Marco Brambilla
Semantic sentence embeddings are usually supervisedly built minimizing distances between pairs of embeddings of sentences labelled as semantically similar by annotators. Since big labelled datasets are rare, in particular for non-English languages, and expensive, recent studies focus on unsupervised approaches that require not-paired input sentences. We instead propose a language-independent approach to build large datasets of pairs of informal texts weakly similar, without manual human effort, exploiting Twitter's intrinsic powerful signals of relatedness: replies and quotes of tweets. We use the collected pairs to train a Transformer model with triplet-like structures, and we test the generated embeddings on Twitter NLP similarity tasks (PIT and TURL) and STSb. We also introduce four new sentence ranking evaluation benchmarks of informal texts, carefully extracted from the initial collections of tweets, proving not only that our best model learns classical Semantic Textual Similarity, but also excels on tasks where pairs of sentences are not exact paraphrases. Ablation studies reveal how increasing the corpus size influences positively the results, even at 2M samples, suggesting that bigger collections of Tweets still do not contain redundant information about semantic similarities.
CLOct 12, 2020
EFSG: Evolutionary Fooling Sentences GeneratorMarco Di Giovanni, Marco Brambilla
Large pre-trained language representation models (LMs) have recently collected a huge number of successes in many NLP tasks. In 2018 BERT, and later its successors (e.g. RoBERTa), obtained state-of-the-art results in classical benchmark tasks, such as GLUE benchmark. After that, works about adversarial attacks have been published to test their generalization proprieties and robustness. In this work, we design Evolutionary Fooling Sentences Generator (EFSG), a model- and task-agnostic adversarial attack algorithm built using an evolutionary approach to generate false-positive sentences for binary classification tasks. We successfully apply EFSG to CoLA and MRPC tasks, on BERT and RoBERTa, comparing performances. Results prove the presence of weak spots in state-of-the-art LMs. We finally test adversarial training as a data augmentation defence approach against EFSG, obtaining stronger improved models with no loss of accuracy when tested on the original datasets.
LGOct 10, 2020
Semi-supervised Neural Networks solve an inverse problem for modeling Covid-19 spreadAlessandro Paticchio, Tommaso Scarlatti, Marios Mattheakis et al.
Studying the dynamics of COVID-19 is of paramount importance to understanding the efficiency of restrictive measures and develop strategies to defend against upcoming contagion waves. In this work, we study the spread of COVID-19 using a semi-supervised neural network and assuming a passive part of the population remains isolated from the virus dynamics. We start with an unsupervised neural network that learns solutions of differential equations for different modeling parameters and initial conditions. A supervised method then solves the inverse problem by estimating the optimal conditions that generate functions to fit the data for those infected by, recovered from, and deceased due to COVID-19. This semi-supervised approach incorporates real data to determine the evolution of the spread, the passive population, and the basic reproduction number for different countries.
CVApr 8, 2019
Improving Image Classification Robustness through Selective CNN-Filters Fine-TuningAlessandro Bianchi, Moreno Raimondo Vendra, Pavlos Protopapas et al.
Image quality plays a big role in CNN-based image classification performance. Fine-tuning the network with distorted samples may be too costly for large networks. To solve this issue, we propose a transfer learning approach optimized to keep into account that in each layer of a CNN some filters are more susceptible to image distortion than others. Our method identifies the most susceptible filters and applies retraining only to the filters that show the highest activation maps distance between clean and distorted images. Filters are ranked using the Borda count election method and then only the most affected filters are fine-tuned. This significantly reduces the number of parameters to retrain. We evaluate this approach on the CIFAR-10 and CIFAR-100 datasets, testing it on two different models and two different types of distortion. Results show that the proposed transfer learning technique recovers most of the lost performance due to input data distortion, at a considerably faster pace with respect to existing methods, thanks to the reduced number of parameters to fine-tune. When few noisy samples are provided for training, our filter-level fine tuning performs particularly well, also outperforming state of the art layer-level transfer learning approaches.
LGNov 20, 2018
T-CGAN: Conditional Generative Adversarial Network for Data Augmentation in Noisy Time Series with Irregular SamplingGiorgia Ramponi, Pavlos Protopapas, Marco Brambilla et al.
In this paper we propose a data augmentation method for time series with irregular sampling, Time-Conditional Generative Adversarial Network (T-CGAN). Our approach is based on Conditional Generative Adversarial Networks (CGAN), where the generative step is implemented by a deconvolutional NN and the discriminative step by a convolutional NN. Both the generator and the discriminator are conditioned on the sampling timestamps, to learn the hidden relationship between data and timestamps, and consequently to generate new time series. We evaluate our model with synthetic and real-world datasets. For the synthetic data, we compare the performance of a classifier trained with T-CGAN-generated data, against the performance of the same classifier trained on the original data. Results show that classifiers trained on T-CGAN-generated data perform the same as classifiers trained on real data, even with very short time series and small training sets. For the real world datasets, we compare our method with other techniques of data augmentation for time series, such as time slicing and time warping, over a classification problem with unbalanced datasets. Results show that our method always outperforms the other approaches, both in case of regularly sampled and irregularly sampled time series. We achieve particularly good performance in case with a small training set and short, noisy, irregularly-sampled time series.