SIOct 17, 2022
Exposing Influence Campaigns in the Age of LLMs: A Behavioral-Based AI Approach to Detecting State-Sponsored TrollsFatima Ezzeddine, Luca Luceri, Omran Ayoub et al.
The detection of state-sponsored trolls operating in influence campaigns on social media is a critical and unsolved challenge for the research community, which has significant implications beyond the online realm. To address this challenge, we propose a new AI-based solution that identifies troll accounts solely through behavioral cues associated with their sequences of sharing activity, encompassing both their actions and the feedback they receive from others. Our approach does not incorporate any textual content shared and consists of two steps: First, we leverage an LSTM-based classifier to determine whether account sequences belong to a state-sponsored troll or an organic, legitimate user. Second, we employ the classified sequences to calculate a metric named the "Troll Score", quantifying the degree to which an account exhibits troll-like behavior. To assess the effectiveness of our method, we examine its performance in the context of the 2016 Russian interference campaign during the U.S. Presidential election. Our experiments yield compelling results, demonstrating that our approach can identify account sequences with an AUC close to 99% and accurately differentiate between Russian trolls and organic users with an AUC of 91%. Notably, our behavioral-based approach holds a significant advantage in the ever-evolving landscape, where textual and linguistic properties can be easily mimicked by Large Language Models (LLMs): In contrast to existing language-based techniques, it relies on more challenging-to-replicate behavioral cues, ensuring greater resilience in identifying influence campaigns, especially given the potential increase in the usage of LLMs for generating inauthentic content. Finally, we assessed the generalizability of our solution to various entities driving different information operations and found promising results that will guide future research.
LGFeb 3
Explanations Leak: Membership Inference with Differential Privacy and Active Learning DefenseFatima Ezzeddine, Osama Zammar, Silvia Giordano et al.
Counterfactual explanations (CFs) are increasingly integrated into Machine Learning as a Service (MLaaS) systems to improve transparency; however, ML models deployed via APIs are already vulnerable to privacy attacks such as membership inference and model extraction, and the impact of explanations on this threat landscape remains insufficiently understood. In this work, we focus on the problem of how CFs expand the attack surface of MLaaS by strengthening membership inference attacks (MIAs), and on the need to design defense mechanisms that mitigate this emerging risk without undermining utility and explainability. First, we systematically analyze how exposing CFs through query-based APIs enables more effective shadow-based MIAs. Second, we propose a defense framework that integrates Differential Privacy (DP) with Active Learning (AL) to jointly reduce memorization and limit effective training data exposure. Finally, we conduct an extensive empirical evaluation to characterize the three-way trade-off between privacy leakage, predictive performance, and explanation quality. Our findings highlight the need to carefully balance transparency, utility, and privacy in the responsible deployment of explainable MLaaS systems.
LGJan 28
Fair Recourse for All: Ensuring Individual and Group Fairness in Counterfactual ExplanationsFatima Ezzeddine, Obaida Ammar, Silvia Giordano et al.
Explainable Artificial Intelligence (XAI) is becoming increasingly essential for enhancing the transparency of machine learning (ML) models. Among the various XAI techniques, counterfactual explanations (CFs) hold a pivotal role due to their ability to illustrate how changes in input features can alter an ML model's decision, thereby offering actionable recourse to users. Ensuring that individuals with comparable attributes and those belonging to different protected groups (e.g., demographic) receive similar and actionable recourse options is essential for trustworthy and fair decision-making. In this work, we address this challenge directly by focusing on the generation of fair CFs. Specifically, we start by defining and formulating fairness at: 1) individual fairness, ensuring that similar individuals receive similar CFs, 2) group fairness, ensuring equitable CFs across different protected groups and 3) hybrid fairness, which accounts for both individual and broader group-level fairness. We formulate the problem as an optimization task and propose a novel model-agnostic, reinforcement learning based approach to generate CFs that satisfy fairness constraints at both the individual and group levels, two objectives that are usually treated as orthogonal. As fairness metrics, we extend existing metrics commonly used for auditing ML models, such as equal choice of recourse and equal effectiveness across individuals and groups. We evaluate our approach on three benchmark datasets, showing that it effectively ensures individual and group fairness while preserving the quality of the generated CFs in terms of proximity and plausibility, and quantify the cost of fairness in the different levels separately. Our work opens a broader discussion on hybrid fairness and its role and implications for XAI and beyond CFs.
LGJul 11, 2025Code
Fair-FLIP: Fair Deepfake Detection with Fairness-Oriented Final Layer Input PrioritisingTomasz Szandala, Fatima Ezzeddine, Natalia Rusin et al.
Artificial Intelligence-generated content has become increasingly popular, yet its malicious use, particularly the deepfakes, poses a serious threat to public trust and discourse. While deepfake detection methods achieve high predictive performance, they often exhibit biases across demographic attributes such as ethnicity and gender. In this work, we tackle the challenge of fair deepfake detection, aiming to mitigate these biases while maintaining robust detection capabilities. To this end, we propose a novel post-processing approach, referred to as Fairness-Oriented Final Layer Input Prioritising (Fair-FLIP), that reweights a trained model's final-layer inputs to reduce subgroup disparities, prioritising those with low variability while demoting highly variable ones. Experimental results comparing Fair-FLIP to both the baseline (without fairness-oriented de-biasing) and state-of-the-art approaches show that Fair-FLIP can enhance fairness metrics by up to 30% while maintaining baseline accuracy, with only a negligible reduction of 0.25%. Code is available on Github: https://github.com/szandala/fair-deepfake-detection-toolbox
NIApr 8, 2024
Liquid Neural Network-based Adaptive Learning vs. Incremental Learning for Link Load Prediction amid Concept Drift due to Network FailuresOmran Ayoub, Davide Andreoletti, Aleksandra Knapińska et al.
Adapting to concept drift is a challenging task in machine learning, which is usually tackled using incremental learning techniques that periodically re-fit a learning model leveraging newly available data. A primary limitation of these techniques is their reliance on substantial amounts of data for retraining. The necessity of acquiring fresh data introduces temporal delays prior to retraining, potentially rendering the models inaccurate if a sudden concept drift occurs in-between two consecutive retrainings. In communication networks, such issue emerges when performing traffic forecasting following a~failure event: post-failure re-routing may induce a drastic shift in distribution and pattern of traffic data, thus requiring a timely model adaptation. In this work, we address this challenge for the problem of traffic forecasting and propose an approach that exploits adaptive learning algorithms, namely, liquid neural networks, which are capable of self-adaptation to abrupt changes in data patterns without requiring any retraining. Through extensive simulations of failure scenarios, we compare the predictive performance of our proposed approach to that of a reference method based on incremental learning. Experimental results show that our proposed approach outperforms incremental learning-based methods in situations where the shifts in traffic patterns are drastic.
LGApr 9, 2024
Differential Privacy for Anomaly Detection: Analyzing the Trade-off Between Privacy and ExplainabilityFatima Ezzeddine, Mirna Saad, Omran Ayoub et al.
Anomaly detection (AD), also referred to as outlier detection, is a statistical process aimed at identifying observations within a dataset that significantly deviate from the expected pattern of the majority of the data. Such a process finds wide application in various fields, such as finance and healthcare. While the primary objective of AD is to yield high detection accuracy, the requirements of explainability and privacy are also paramount. The first ensures the transparency of the AD process, while the second guarantees that no sensitive information is leaked to untrusted parties. In this work, we exploit the trade-off of applying Explainable AI (XAI) through SHapley Additive exPlanations (SHAP) and differential privacy (DP). We perform AD with different models and on various datasets, and we thoroughly evaluate the cost of privacy in terms of decreased accuracy and explainability. Our results show that the enforcement of privacy through DP has a significant impact on detection accuracy and explainability, which depends on both the dataset and the considered AD model. We further show that the visual interpretation of explanations is also influenced by the choice of the AD algorithm.
LGApr 4, 2024
Knowledge Distillation-Based Model Extraction Attack using GAN-based Private Counterfactual ExplanationsFatima Ezzeddine, Omran Ayoub, Silvia Giordano
In recent years, there has been a notable increase in the deployment of machine learning (ML) models as services (MLaaS) across diverse production software applications. In parallel, explainable AI (XAI) continues to evolve, addressing the necessity for transparency and trustworthiness in ML models. XAI techniques aim to enhance the transparency of ML models by providing insights, in terms of model's explanations, into their decision-making process. Simultaneously, some MLaaS platforms now offer explanations alongside the ML prediction outputs. This setup has elevated concerns regarding vulnerabilities in MLaaS, particularly in relation to privacy leakage attacks such as model extraction attacks (MEA). This is due to the fact that explanations can unveil insights about the inner workings of the model which could be exploited by malicious users. In this work, we focus on investigating how model explanations, particularly counterfactual explanations (CFs), can be exploited for performing MEA within the MLaaS platform. We also delve into assessing the effectiveness of incorporating differential privacy (DP) as a mitigation strategy. To this end, we first propose a novel approach for MEA based on Knowledge Distillation (KD) to enhance the efficiency of extracting a substitute model of a target model exploiting CFs, without any knowledge about the training data distribution by the attacker. Then, we advise an approach for training CF generators incorporating DP to generate private CFs. We conduct thorough experimental evaluations on real-world datasets and demonstrate that our proposed KD-based MEA can yield a high-fidelity substitute model with a reduced number of queries with respect to baseline approaches. Furthermore, our findings reveal that including a privacy layer can allow mitigating the MEA. However, on the account of the quality of CFs, impacts the performance of the explanations.
CLOct 29, 2025
The Collective Turing Test: Large Language Models Can Generate Realistic Multi-User DiscussionsAzza Bouleimen, Giordano De Marzo, Taehee Kim et al.
Large Language Models (LLMs) offer new avenues to simulate online communities and social media. Potential applications range from testing the design of content recommendation algorithms to estimating the effects of content policies and interventions. However, the validity of using LLMs to simulate conversations between various users remains largely untested. We evaluated whether LLMs can convincingly mimic human group conversations on social media. We collected authentic human conversations from Reddit and generated artificial conversations on the same topic with two LLMs: Llama 3 70B and GPT-4o. When presented side-by-side to study participants, LLM-generated conversations were mistaken for human-created content 39\% of the time. In particular, when evaluating conversations generated by Llama 3, participants correctly identified them as AI-generated only 56\% of the time, barely better than random chance. Our study demonstrates that LLMs can generate social media conversations sufficiently realistic to deceive humans when reading them, highlighting both a promising potential for social simulation and a warning message about the potential misuse of LLMs to generate new inauthentic social media content.
CRMay 13, 2025
On the interplay of Explainability, Privacy and Predictive Performance with Explanation-assisted Model ExtractionFatima Ezzeddine, Rinad Akel, Ihab Sbeity et al.
Machine Learning as a Service (MLaaS) has gained important attraction as a means for deploying powerful predictive models, offering ease of use that enables organizations to leverage advanced analytics without substantial investments in specialized infrastructure or expertise. However, MLaaS platforms must be safeguarded against security and privacy attacks, such as model extraction (MEA) attacks. The increasing integration of explainable AI (XAI) within MLaaS has introduced an additional privacy challenge, as attackers can exploit model explanations particularly counterfactual explanations (CFs) to facilitate MEA. In this paper, we investigate the trade offs among model performance, privacy, and explainability when employing Differential Privacy (DP), a promising technique for mitigating CF facilitated MEA. We evaluate two distinct DP strategies: implemented during the classification model training and at the explainer during CF generation.
HCFeb 26, 2021
The Virtual Emotion Loop: Towards Emotion-Driven Services via Virtual RealityDavide Andreoletti, Luca Luceri, Tiziano Leidi et al.
The importance of emotions in service and in product design is well known. However, it is still not very well understood how users' emotions can be incorporated in a product or service lifecycle. We argue that this gap is due to a lack of a methodological framework for an effective investigation of the emotional response of persons when using products and services. Indeed, the emotional response of users is generally investigated by means of methods (e.g., surveys) that are not effective for this purpose. In our view, Virtual Reality (VR) technologies represent the perfect medium to evoke and recognize users' emotional response, as well as to prototype products and services (and, for the latter, even deliver them). In this paper, we first provide our definition of emotion-driven services, and then we propose a novel methodological framework, referred to as the Virtual-Reality-Based Emotion-Elicitation-and-Recognition loop (VEE-loop), that can be exploited to realize it. Specifically, the VEE-loop consists in a continuous monitoring of users' emotions, which are then provided to service designers as an implicit users' feedback. This information is used to dynamically change the content of the VR environment, until the desired affective state is solicited. Finally, we discuss issues and opportunities of this VEE-loop, and we also present potential applications of the VEE-loop in research and in various application areas.
CRJul 20, 2020
Privacy-Preserving Multi-Operator Contact Tracing for Early Detection of Covid19 ContagionsDavide Andreoletti, Omran Ayoub, Silvia Giordano et al.
The outbreak of coronavirus disease 2019 (covid-19) is imposing a severe worldwide lock-down. Contact tracing based on smartphones' applications (apps) has emerged as a possible solution to trace contagions and enforce a more sustainable selective quarantine. However, a massive adoption of these apps is required to reach the critical mass needed for effective contact tracing. As an alternative, geo-location technologies in next generation networks (e.g., 5G) can enable Mobile Operators (MOs) to perform passive tracing of users' mobility and contacts with a promised accuracy of down to one meter. To effectively detect contagions, the identities of positive individuals, which are known only by a Governmental Authority (GA), are also required. Note that, besides being extremely sensitive, these data might also be critical from a business perspective. Hence, MOs and the GA need to exchange and process users' geo-locations and infection status data in a privacy-preserving manner. In this work, we propose a privacy-preserving protocol that enables multiple MOs and the GA to share and process users' data to make only the final users discover the number of their contacts with positive individuals. The protocol is based on existing privacy-enhancing strategies that guarantee that users' mobility and infection status are only known to their MOs and to the GA, respectively. From extensive simulations, we observe that the cost to guarantee total privacy (evaluated in terms of data overhead introduced by the protocol) is acceptable, and can also be significantly reduced if we accept a negligible compromise in users' privacy.
SIJan 28, 2020
Detecting Troll Behavior via Inverse Reinforcement Learning: A Case Study of Russian Trolls in the 2016 US ElectionLuca Luceri, Silvia Giordano, Emilio Ferrara
Since the 2016 US Presidential election, social media abuse has been eliciting massive concern in the academic community and beyond. Preventing and limiting the malicious activity of users, such as trolls and bots, in their manipulation campaigns is of paramount importance for the integrity of democracy, public health, and more. However, the automated detection of troll accounts is an open challenge. In this work, we propose an approach based on Inverse Reinforcement Learning (IRL) to capture troll behavior and identify troll accounts. We employ IRL to infer a set of online incentives that may steer user behavior, which in turn highlights behavioral differences between troll and non-troll accounts, enabling their accurate classification. As a study case, we consider the troll accounts identified by the US Congress during the investigation of Russian meddling in the 2016 US Presidential election. We report promising results: the IRL-based approach is able to accurately detect troll accounts (AUC=89.1%). The differences in the predictive features between the two classes of accounts enables a principled understanding of the distinctive behaviors reflecting the incentives trolls and non-trolls respond to.