Andrea Vitaletti

LG
h-index26
12papers
193citations
Novelty25%
AI Score45

12 Papers

APNov 29, 2016
Drift Removal in Plant Electrical Signals via IIR Filtering Using Wavelet Energy

Saptarshi Das, Barry Juans Ajiwibawa, Shre Kumar Chatterjee et al.

Plant electrical signals often contains low frequency drifts with or without the application of external stimuli. Quantification of the randomness in plant signals in a stimulus-specific way is hindered because the knowledge of vital frequency information in the actual biological response is not known yet. Here we design an optimum Infinite Impulse Response (IIR) filter which removes the low frequency drifts and preserves the frequency spectrum corresponding to the random component of the unstimulated plant signals by bringing the bias due to unknown artifacts and drifts to a minimum. We use energy criteria of wavelet packet transform (WPT) for optimization based tuning of the IIR filter parameters. Such an optimum filter enforces that the energy distribution of the pre-stimulus parts in different experiments are almost overlapped but under different stimuli the distributions of the energy get changed. The reported research may popularize plant signal processing, as a separate field, besides other conventional bioelectrical signal processing paradigms.

41.1AIMay 31
GuidaPA: Privacy-Preserving Chatbot for Public Administration via Federated Learning

Daniel M. Jimenez-Gutierrez, Albenzio Cirillo, Raffaele Nicolussi et al.

We present GuidaPA, a privacy-preserving chatbot for the Italian Public Administration (PA) trained via Federated Learning (FL) on documentation from two national PA platforms, SIGESON and SIDFORS. Our corpus includes approximately 8 pages of SIGESON manuals and 31 pages of SIDFORS manuals/FAQs; while this study uses public documentation as a safe proxy, the intended deployment extends to restricted internal sources (e.g., tickets, officer manuals, database extracts) that can not be centrally pooled due to regulatory and organizational constraints. GuidaPA integrates role-based access control, secure client-side preprocessing, explicit monitoring of non-IID effects, and parameter-efficient federated fine-tuning of large language models. Using QLoRA (4-bit) over 15 federated rounds with an 80/20 train-test split per client, we evaluate answer quality with ROUGE, BLEU-4, and METEOR. The best federated model achieves ROUGE-1/2/L of 61.10/55.77/59.44, BLEU-4 of 45.02, and METEOR of 63.94-close to private centralized fine-tuning while keeping data on-site. Compared to the general-purpose baseline, domain fine-tuning improves ROUGE-1 from 41.45 to 62.18 and BLEU-4 from 26.97 to 50.90. Overall, the results indicate that FL can deliver high-quality conversational AI for public services without centralized data sharing

LGAug 23, 2022
Application of federated learning techniques for arrhythmia classification using 12-lead ECG signals

Daniel Mauricio Jimenez Gutierrez, Hafiz Muuhammad Hassan, Lorella Landi et al.

Artificial Intelligence-based (AI) analysis of large, curated medical datasets is promising for providing early detection, faster diagnosis, and more effective treatment using low-power Electrocardiography (ECG) monitoring devices information. However, accessing sensitive medical data from diverse sources is highly restricted since improper use, unsafe storage, or data leakage could violate a person's privacy. This work uses a Federated Learning (FL) privacy-preserving methodology to train AI models over heterogeneous sets of high-definition ECG from 12-lead sensor arrays collected from six heterogeneous sources. We evaluated the capacity of the resulting models to achieve equivalent performance compared to state-of-the-art models trained in a Centralized Learning (CL) fashion. Moreover, we assessed the performance of our solution over Independent and Identical distributed (IID) and non-IID federated data. Our methodology involves machine learning techniques based on Deep Neural Networks and Long-Short-Term Memory models. It has a robust data preprocessing pipeline with feature engineering, selection, and data balancing techniques. Our AI models demonstrated comparable performance to models trained using CL, IID, and non-IID approaches. They showcased advantages in reduced complexity and faster training time, making them well-suited for cloud-edge architectures.

ETJan 16
A Proof of Concept for a Digital Twin of an Ultrasonic Fermentation System

Francesco Saverio Sconocchia Pisoni, Andrea Vitaletti, Davide Appolloni et al.

This paper presents the design and implementation of a proof of concept digital twin for an innovative ultrasonic-enhanced beer-fermentation system, developed to enable intelligent monitoring, prediction, and actuation in yeast-growth environments. A traditional fermentation tank is equipped with a piezoelectric transducer able to irradiate the tank with ultrasonic waves, providing an external abiotic stimulus to enhance the growth of yeast and accelerate the fermentation process. At its core, the digital twin incorporates a predictive model that estimates yeast's culture density over time based on the surrounding environmental conditions. To this end, we implement, tailor and extend the model proposed in Palacios et al., allowing us to effectively handle the limited number of available training samples by using temperature, ultrasonic frequency, and duty cycle as inputs. The results obtained along with the assessment of model performance demonstrate the feasibility of the proposed approach.

LGDec 23, 2025
Clust-PSI-PFL: A Population Stability Index Approach for Clustered Non-IID Personalized Federated Learning

Daniel M. Jimenez-Gutierrez, Mehrdad Hassanzadeh, Aris Anagnostopoulos et al.

Federated learning (FL) supports privacy-preserving, decentralized machine learning (ML) model training by keeping data on client devices. However, non-independent and identically distributed (non-IID) data across clients biases updates and degrades performance. To alleviate these issues, we propose Clust-PSI-PFL, a clustering-based personalized FL framework that uses the Population Stability Index (PSI) to quantify the level of non-IID data. We compute a weighted PSI metric, $WPSI^L$, which we show to be more informative than common non-IID metrics (Hellinger, Jensen-Shannon, and Earth Mover's distance). Using PSI features, we form distributionally homogeneous groups of clients via K-means++; the number of optimal clusters is chosen by a systematic silhouette-based procedure, typically yielding few clusters with modest overhead. Across six datasets (tabular, image, and text modalities), two partition protocols (Dirichlet with parameter $α$ and Similarity with parameter S), and multiple client sizes, Clust-PSI-PFL delivers up to 18% higher global accuracy than state-of-the-art baselines and markedly improves client fairness by a relative improvement of 37% under severe non-IID data. These results establish PSI-guided clustering as a principled, lightweight mechanism for robust PFL under label skew.

LGNov 19, 2024
Non-IID data in Federated Learning: A Survey with Taxonomy, Metrics, Methods, Frameworks and Future Directions

Daniel M. Jimenez G., David Solans, Mikko Heikkila et al.

Recent advances in machine learning have highlighted Federated Learning (FL) as a promising approach that enables multiple distributed users (so-called clients) to collectively train ML models without sharing their private data. While this privacy-preserving method shows potential, it struggles when data across clients is not independent and identically distributed (non-IID) data. The latter remains an unsolved challenge that can result in poorer model performance and slower training times. Despite the significance of non-IID data in FL, there is a lack of consensus among researchers about its classification and quantification. This technical survey aims to fill that gap by providing a detailed taxonomy for non-IID data, partition protocols, and metrics to quantify data heterogeneity. Additionally, we describe popular solutions to address non-IID data and standardized frameworks employed in FL with heterogeneous data. Based on our state-of-the-art survey, we present key lessons learned and suggest promising future research directions.

CRAug 19, 2025
On the Security and Privacy of Federated Learning: A Survey with Attacks, Defenses, Frameworks, Applications, and Future Directions

Daniel M. Jimenez-Gutierrez, Yelizaveta Falkouskaya, Jose L. Hernandez-Ramos et al.

Federated Learning (FL) is an emerging distributed machine learning paradigm enabling multiple clients to train a global model collaboratively without sharing their raw data. While FL enhances data privacy by design, it remains vulnerable to various security and privacy threats. This survey provides a comprehensive overview of more than 200 papers regarding the state-of-the-art attacks and defense mechanisms developed to address these challenges, categorizing them into security-enhancing and privacy-preserving techniques. Security-enhancing methods aim to improve FL robustness against malicious behaviors such as byzantine attacks, poisoning, and Sybil attacks. At the same time, privacy-preserving techniques focus on protecting sensitive data through cryptographic approaches, differential privacy, and secure aggregation. We critically analyze the strengths and limitations of existing methods, highlight the trade-offs between privacy, security, and model performance, and discuss the implications of non-IID data distributions on the effectiveness of these defenses. Furthermore, we identify open research challenges and future directions, including the need for scalable, adaptive, and energy-efficient solutions operating in dynamic and heterogeneous FL environments. Our survey aims to guide researchers and practitioners in developing robust and privacy-preserving FL systems, fostering advancements safeguarding collaborative learning frameworks' integrity and confidentiality.

LGMar 21, 2025
A Thorough Assessment of the Non-IID Data Impact in Federated Learning

Daniel M. Jimenez-Gutierrez, Mehrdad Hassanzadeh, Aris Anagnostopoulos et al.

Federated learning (FL) allows collaborative machine learning (ML) model training among decentralized clients' information, ensuring data privacy. The decentralized nature of FL deals with non-independent and identically distributed (non-IID) data. This open problem has notable consequences, such as decreased model performance and more significant convergence times. Despite its importance, experimental studies systematically addressing all types of data heterogeneity (a.k.a. non-IIDness) remain scarce. We aim to fill this gap by assessing and quantifying the non-IID effect through a thorough empirical analysis. We use the Hellinger Distance (HD) to measure differences in distribution among clients. Our study benchmarks four state-of-the-art strategies for handling non-IID data, including label, feature, quantity, and spatiotemporal skewness, under realistic and controlled conditions. This is the first comprehensive analysis of the spatiotemporal skew effect in FL. Our findings highlight the significant impact of label and spatiotemporal skew non-IID types on FL model performance, with notable performance drops occurring at specific HD thresholds. Additionally, the FL performance is heavily affected mainly when the non-IIDness is extreme. Thus, we provide recommendations for FL research to tackle data heterogeneity effectively. Our work represents the most extensive examination of non-IIDness in FL, offering a robust foundation for future research.

CROct 6, 2021
Empowering Citizens by a Blockchain-Based Robinson List

Albenzio Cirillo, Vito Dalena, Antonio Mauro et al.

A Robinson list protects phone subscribers against commercial spam calls. Its least basic functionality is to collect the denial of the subscribers to be contacted by market operators. Nowadays, Robinson lists run as centralised services, which implies that citizens should trust third parties for the management of their choices. In this paper, we show a design that allows us to realise a Robinson list as a decentralised service. Our work leverages the experience developed by Fondazione Ugo Bordoni as the manager of the Italian Robinson list. We present a general solution and a proof-of-concept (PoC) adopting the Algorand technology. We evaluate the performances of our PoC in terms of its scalability and of the latency perceived by the involved actors. We also discuss aspects related to identity management and privacy.

HCJun 20, 2019
Scenarios for Educational and Game Activities using Internet of Things Data

Chrysanthi Tziortzioti, Irene Mavrommati, Georgios Mylonas et al.

Raising awareness among young people and changing their behavior and habits concerning energy usage and the environment is key to achieving a sustainable planet. The goal to address the global climate problem requires informing the population on their roles in mitigation actions and adaptation of sustainable behaviors. Addressing climate change and achieve ambitious energy and climate targets requires a change in citizen behavior and consumption practices. IoT sensing and related scenario and practices, which address school children via discovery, gamification, and educational activities, are examined in this paper. Use of seawater sensors in STEM education, that has not previously been addressed, is included in these educational scenaria.

BIO-PHMay 13, 2017
Comparison of Decision Tree Based Classification Strategies to Detect External Chemical Stimuli from Raw and Filtered Plant Electrical Response

Shre Kumar Chatterjee, Saptarshi Das, Koushik Maharatna et al.

Plants monitor their surrounding environment and control their physiological functions by producing an electrical response. We recorded electrical signals from different plants by exposing them to Sodium Chloride (NaCl), Ozone (O3) and Sulfuric Acid (H2SO4) under laboratory conditions. After applying pre-processing techniques such as filtering and drift removal, we extracted few statistical features from the acquired plant electrical signals. Using these features, combined with different classification algorithms, we used a decision tree based multi-class classification strategy to identify the three different external chemical stimuli. We here present our exploration to obtain the optimum set of ranked feature and classifier combination that can separate a particular chemical stimulus from the incoming stream of plant electrical signals. The paper also reports an exhaustive comparison of similar feature based classification using the filtered and the raw plant signals, containing the high frequency stochastic part and also the low frequency trends present in it, as two different cases for feature extraction. The work, presented in this paper opens up new possibilities for using plant electrical signals to monitor and detect other environmental stimuli apart from NaCl, O3 and H2SO4 in future.

BIO-PHNov 29, 2016
Exploring Strategies for Classification of External Stimuli Using Statistical Features of the Plant Electrical Response

Shre Kumar Chatterjee, Saptarshi Das, Koushik Maharatna et al.

Plants sense their environment by producing electrical signals which in essence represent changes in underlying physiological processes. These electrical signals, when monitored, show both stochastic and deterministic dynamics. In this paper, we compute 11 statistical features from the raw non-stationary plant electrical signal time series to classify the stimulus applied (causing the electrical signal). By using different discriminant analysis based classification techniques, we successfully establish that there is enough information in the raw electrical signal to classify the stimuli. In the process, we also propose two standard features which consistently give good classification results for three types of stimuli - Sodium Chloride (NaCl), Sulphuric Acid (H2SO4) and Ozone (O3). This may facilitate reduction in the complexity involved in computing all the features for online classification of similar external stimuli in future.