NIMay 22
Purification Strategy Optimization for Entanglement Routing in Quantum NetworksJavier Vecino Peñas, Ana Fernández-Vilas, Rebeca P. Díaz-Redondo et al.
Quantum networks rely on the efficient distribution of entanglement to enable long-distance quantum communication and information processing. A key challenge in these networks is the design of routing protocols capable of maintaining high quality entanglement in the presence of noise, decoherence, and imperfect operations, which progressively degrade the fidelity of entangled states through entanglement swapping. Entanglement purification provides an effective mechanism to mitigate this degradation at the cost of additional resources. In this work, we study purification-aware quantum routing and formulate the problem of selecting optimal purification strategies as an optimization task. By employing dynamic programming techniques, we identify strategies that optimally balance resource consumption and end-to-end fidelity, demonstrating the effectiveness of our approach across different scenarios.
LGSep 26, 2024
Byzantine-Robust Aggregation for Securing Decentralized Federated LearningDiego Cajaraville-Aboy, Ana Fernández-Vilas, Rebeca P. Díaz-Redondo et al.
Federated Learning (FL) emerges as a distributed machine learning approach that addresses privacy concerns by training AI models locally on devices. Decentralized Federated Learning (DFL) extends the FL paradigm by eliminating the central server, thereby enhancing scalability and robustness through the avoidance of a single point of failure. However, DFL faces significant challenges in optimizing security, as most Byzantine-robust algorithms proposed in the literature are designed for centralized scenarios. In this paper, we present a novel Byzantine-robust aggregation algorithm to enhance the security of Decentralized Federated Learning environments, coined WFAgg. This proposal handles the adverse conditions and strength robustness of dynamic decentralized topologies at the same time by employing multiple filters to identify and mitigate Byzantine attacks. Experimental results demonstrate the effectiveness of the proposed algorithm in maintaining model accuracy and convergence in the presence of various Byzantine attack scenarios, outperforming state-of-the-art centralized Byzantine-robust aggregation schemes (such as Multi-Krum or Clustering). These algorithms are evaluated on an IID image classification problem in both centralized and decentralized scenarios.
LGNov 23, 2023
A Blockchain Solution for Collaborative Machine Learning over IoTCarlos Beis-Penedo, Francisco Troncoso-Pastoriza, Rebeca P. Díaz-Redondo et al.
The rapid growth of Internet of Things (IoT) devices and applications has led to an increased demand for advanced analytics and machine learning techniques capable of handling the challenges associated with data privacy, security, and scalability. Federated learning (FL) and blockchain technologies have emerged as promising approaches to address these challenges by enabling decentralized, secure, and privacy-preserving model training on distributed data sources. In this paper, we present a novel IoT solution that combines the incremental learning vector quantization algorithm (XuILVQ) with Ethereum blockchain technology to facilitate secure and efficient data sharing, model training, and prototype storage in a distributed environment. Our proposed architecture addresses the shortcomings of existing blockchain-based FL solutions by reducing computational and communication overheads while maintaining data privacy and security. We assess the performance of our system through a series of experiments, showcasing its potential to enhance the accuracy and efficiency of machine learning tasks in IoT settings.
LGJul 12, 2024
Overcoming Catastrophic Forgetting in Tabular Data Classification: A Pseudorehearsal-based approachPablo García-Santaclara, Bruno Fernández-Castro, Rebeca P. Díaz-Redondo
Continual learning (CL) poses the important challenge of adapting to evolving data distributions without forgetting previously acquired knowledge while consolidating new knowledge. In this paper, we introduce a new methodology, coined as Tabular-data Rehearsal-based Incremental Lifelong Learning framework (TRIL3), designed to address the phenomenon of catastrophic forgetting in tabular data classification problems. TRIL3 uses the prototype-based incremental generative model XuILVQ to generate synthetic data to preserve old knowledge and the DNDF algorithm, which was modified to run in an incremental way, to learn classification tasks for tabular data, without storing old samples. After different tests to obtain the adequate percentage of synthetic data and to compare TRIL3 with other CL available proposals, we can conclude that the performance of TRIL3 outstands other options in the literature using only 50% of synthetic data.
LGNov 3, 2025
Verifiable Split Learning via zk-SNARKsRana Alaa, Darío González-Ferreiro, Carlos Beis-Penedo et al.
Split learning is an approach to collaborative learning in which a deep neural network is divided into two parts: client-side and server-side at a cut layer. The client side executes its model using its raw input data and sends the intermediate activation to the server side. This configuration architecture is very useful for enabling collaborative training when data or resources are separated between devices. However, split learning lacks the ability to verify the correctness and honesty of the computations that are performed and exchanged between the parties. To this purpose, this paper proposes a verifiable split learning framework that integrates a zk-SNARK proof to ensure correctness and verifiability. The zk-SNARK proof and verification are generated for both sides in forward propagation and backward propagation on the server side, guaranteeing verifiability on both sides. The verifiable split learning architecture is compared to a blockchain-enabled system for the same deep learning network, one that records updates but without generating the zero-knowledge proof. From the comparison, it can be deduced that applying the zk-SNARK test achieves verifiability and correctness, while blockchains are lightweight but unverifiable.
CLDec 12, 2023
Deep Learning-based Sentiment Classification: A Comparative SurveyMohamed Kayed, Rebeca P. Díaz-Redondo, Alhassan Mabrouk
Recently, Deep Learning (DL) approaches have been applied to solve the Sentiment Classification (SC) problem, which is a core task in reviews mining or Sentiment Analysis (SA). The performances of these approaches are affected by different factors. This paper addresses these factors and classifies them into three categories: data preparation based factors, feature representation based factors and the classification techniques based factors. The paper is a comprehensive literature-based survey that compares the performance of more than 100 DL-based SC approaches by using 21 public datasets of reviews given by customers within three specific application domains (products, movies and restaurants). These 21 datasets have different characteristics (balanced/imbalanced, size, etc.) to give a global vision for our study. The comparison explains how the proposed factors quantitatively affect the performance of the studied DL-based SC approaches.
CLDec 12, 2023
SEOpinion: Summarization and Exploration Opinion of E-Commerce WebsitesAlhassan Mabrouk, Rebeca P. Díaz-Redondo, Mohammed Kayed
E-Commerce (EC) websites provide a large amount of useful information that exceed human cognitive processing ability. In order to help customers in comparing alternatives when buying a product, previous studies designed opinion summarization systems based on customer reviews. They ignored templates' information provided by manufacturers, although these descriptive information have much product aspects or characteristics. Therefore, this paper proposes a methodology coined as SEOpinion (Summa-rization and Exploration of Opinions) which provides a summary for the product aspects and spots opinion(s) regarding them, using a combination of templates' information with the customer reviews in two main phases. First, the Hierarchical Aspect Extraction (HAE) phase creates a hierarchy of product aspects from the template. Subsequently, the Hierarchical Aspect-based Opinion Summarization (HAOS) phase enriches this hierarchy with customers' opinions; to be shown to other potential buyers. To test the feasibility of using Deep Learning-based BERT techniques with our approach, we have created a corpus by gathering information from the top five EC websites for laptops. The experimental results show that Recurrent Neural Network (RNN) achieves better results (77.4% and 82.6% in terms of F1-measure for the first and second phase) than the Convolutional Neural Network (CNN) and the Support Vector Machine (SVM) technique.
MLDec 16, 2025
Continual Learning at the Edge: An Agnostic IIoT ArchitecturePablo García-Santaclara, Bruno Fernández-Castro, Rebeca P. Díaz-Redondo et al.
The exponential growth of Internet-connected devices has presented challenges to traditional centralized computing systems due to latency and bandwidth limitations. Edge computing has evolved to address these difficulties by bringing computations closer to the data source. Additionally, traditional machine learning algorithms are not suitable for edge-computing systems, where data usually arrives in a dynamic and continual way. However, incremental learning offers a good solution for these settings. We introduce a new approach that applies the incremental learning philosophy within an edge-computing scenario for the industrial sector with a specific purpose: real time quality control in a manufacturing system. Applying continual learning we reduce the impact of catastrophic forgetting and provide an efficient and effective solution.
SIDec 13, 2023
A hybrid analysis of LBSN data to early detect anomalies in crowd dynamicsRebeca P. Díaz-Redondo, Carlos Garcia-Rubio, Ana Fernández Vilas et al.
Undoubtedly, Location-based Social Networks (LBSNs) provide an interesting source of geo-located data that we have previously used to obtain patterns of the dynamics of crowds throughout urban areas. According to our previous results, activity in LBSNs reflects the real activity in the city. Therefore, unexpected behaviors in the social media activity are a trustful evidence of unexpected changes of the activity in the city. In this paper we introduce a hybrid solution to early detect these changes based on applying a combination of two approaches, the use of entropy analysis and clustering techniques, on the data gathered from LBSNs. In particular, we have performed our experiments over a data set collected from Instagram for seven months in New York City, obtaining promising results.
DCDec 11, 2023
Unsupervised KPIs-Based Clustering of Jobs in HPC Data CentersMohamed S. Halawa, Rebeca P. Díaz-Redondo, Ana Fernández-Vilas
Performance analysis is an essential task in High-Performance Computing (HPC) systems and it is applied for different purposes such as anomaly detection, optimal resource allocation, and budget planning. HPC monitoring tasks generate a huge number of Key Performance Indicators (KPIs) to supervise the status of the jobs running in these systems. KPIs give data about CPU usage, memory usage, network (interface) traffic, or other sensors that monitor the hardware. Analyzing this data, it is possible to obtain insightful information about running jobs, such as their characteristics, performance, and failures. The main contribution in this paper is to identify which metric/s (KPIs) is/are the most appropriate to identify/classify different types of jobs according to their behavior in the HPC system. With this aim, we have applied different clustering techniques (partition and hierarchical clustering algorithms) using a real dataset from the Galician Computation Center (CESGA). We have concluded that (i) those metrics (KPIs) related to the Network (interface) traffic monitoring provide the best cohesion and separation to cluster HPC jobs, and (ii) hierarchical clustering algorithms are the most suitable for this task. Our approach was validated using a different real dataset from the same HPC center.
CVDec 18, 2023
Use of BIM Data as Input and Output for Improved Detection of Lighting Elements in BuildingsFrancisco Troncoso-Pastoriza, Pablo Eguía-Oller, Rebeca P. Díaz-Redondo et al.
This paper introduces a complete method for the automatic detection, identification and localization of lighting elements in buildings, leveraging the available building information modeling (BIM) data of a building and feeding the BIM model with the new collected information, which is key for energy-saving strategies. The detection system is heavily improved from our previous work, with the following two main contributions: (i) a new refinement algorithm to provide a better detection rate and identification performance with comparable computational resources and (ii) a new plane estimation, filtering and projection step to leverage the BIM information earlier for lamps that are both hanging and embedded. The two modifications are thoroughly tested in five different case studies, yielding better results in terms of detection, identification and localization.
CVDec 18, 2023
Generation of BIM data based on the automatic detection, identification and localization of lamps in buildingsFrancisco Troncoso-Pastoriza, Pablo Eguía-Oller, Rebeca P. Díaz-Redondo et al.
In this paper we introduce a method that supports the detection, identification and localization of lamps in a building, with the main goal of automatically feeding its energy model by means of Building Information Modeling (BIM) methods. The proposed method, thus, provides useful information to apply energy-saving strategies to reduce energy consumption in the building sector through the correct management of the lighting infrastructure. Based on the unique geometry and brightness of lamps and the use of only greyscale images, our methodology is able to obtain accurate results despite its low computational needs, resulting in near-real-time processing. The main novelty is that the focus of the candidate search is not over the entire image but instead only on a limited region that summarizes the specific characteristics of the lamp. The information obtained from our approach was used on the Green Building XML Schema to illustrate the automatic generation of BIM data from the results of the algorithm.
SIDec 18, 2023
Discovering Geo-dependent Stories by Combining Density-based Clustering and Thread-based Aggregation techniquesHéctor Cerezo-Costas, Ana Fernández Vilas, Manuela Martín-Vicente et al.
Citizens are actively interacting with their surroundings, especially through social media. Not only do shared posts give important information about what is happening (from the users' perspective), but also the metadata linked to these posts offer relevant data, such as the GPS-location in Location-based Social Networks (LBSNs). In this paper we introduce a global analysis of the geo-tagged posts in social media which supports (i) the detection of unexpected behavior in the city and (ii) the analysis of the posts to infer what is happening. The former is obtained by applying density-based clustering techniques, whereas the latter is consequence of applying natural language processing. We have applied our methodology to a dataset obtained from Instagram activity in New York City for seven months obtaining promising results. The developed algorithms require very low resources, being able to analyze millions of data-points in commodity hardware in less than one hour without applying complex parallelization techniques. Furthermore, the solution can be easily adapted to other geo-tagged data sources without extra effort.
LGDec 19, 2023
Decentralised and collaborative machine learning framework for IoTMartín González-Soto, Rebeca P. Díaz-Redondo, Manuel Fernández-Veiga et al.
Decentralised machine learning has recently been proposed as a potential solution to the security issues of the canonical federated learning approach. In this paper, we propose a decentralised and collaborative machine learning framework specially oriented to resource-constrained devices, usual in IoT deployments. With this aim we propose the following construction blocks. First, an incremental learning algorithm based on prototypes that was specifically implemented to work in low-performance computing elements. Second, two random-based protocols to exchange the local models among the computing elements in the network. Finally, two algorithmics approaches for prediction and prototype creation. This proposal was compared to a typical centralized incremental learning approach in terms of accuracy, training time and robustness with very promising results.
DCMar 12
Decentralized Orchestration Architecture for Fluid Computing: A Secure Distributed AI Use CaseDiego Cajaraville-Aboy, Ana Fernández-Vilas, Rebeca P. Díaz-Redondo et al.
Distributed AI and IoT applications increasingly execute across heterogeneous resources spanning end devices, edge/fog infrastructure, and cloud platforms, often under different administrative domains. Fluid Computing has emerged as a promising paradigm for enhancing massive resource management across the computing continuum by treating such resources as a unified fabric, enabling optimal service-agnostic deployments driven by application requirements. However, existing solutions remain largely centralized and often do not explicitly address multi-domain considerations. This paper proposes an agnostic multi-domain orchestration architecture for fluid computing environments. The orchestration plane enables decentralized coordination among domains that maintain local autonomy while jointly realizing intent-based deployment requests from tenants, ensuring end-to-end placement and execution. To this end, the architecture elevates domain-side control services as first-class capabilities to support application-level enhancement at runtime. As a representative use case, we consider a multi-domain Decentralized Federated Learning (DFL) deployment under Byzantine threats. We leverage domain-side capabilities to enhance Byzantine security by introducing FU-HST, an SDN-enabled multi-domain anomaly detection mechanism that complements Byzantine-robust aggregation. We validate the approach via simulation in single- and multi-domain settings, evaluating anomaly detection, DFL performance, and computation/communication overhead.
CVDec 18, 2023
Orientation-Constrained System for Lamp Detection in Buildings Based on Computer VisionFrancisco Troncoso-Pastoriza, Pablo Eguía-Oller, Rebeca P. Díaz-Redondo et al.
Computer vision is used in this work to detect lighting elements in buildings with the goal of improving the accuracy of previous methods to provide a precise inventory of the location and state of lamps. Using the framework developed in our previous works, we introduce two new modifications to enhance the system: first, a constraint on the orientation of the detected poses in the optimization methods for both the initial and the refined estimates based on the geometric information of the building information modelling (BIM) model; second, an additional reprojection error filtering step to discard the erroneous poses introduced with the orientation restrictions, keeping the identification and localization errors low while greatly increasing the number of detections. These~enhancements are tested in five different case studies with more than 30,000 images, with results showing improvements in the number of detections, the percentage of correct model and state identifications, and the distance between detections and reference positions
AIDec 11, 2023
KPIs-Based Clustering and Visualization of HPC jobs: a Feature Reduction ApproachMohamed Soliman Halawa, Rebeca P. Díaz-Redondo, Ana Fernández-Vilas
High-Performance Computing (HPC) systems need to be constantly monitored to ensure their stability. The monitoring systems collect a tremendous amount of data about different parameters or Key Performance Indicators (KPIs), such as resource usage, IO waiting time, etc. A proper analysis of this data, usually stored as time series, can provide insight in choosing the right management strategies as well as the early detection of issues. In this paper, we introduce a methodology to cluster HPC jobs according to their KPI indicators. Our approach reduces the inherent high dimensionality of the collected data by applying two techniques to the time series: literature-based and variance-based feature extraction. We also define a procedure to visualize the obtained clusters by combining the two previous approaches and the Principal Component Analysis (PCA). Finally, we have validated our contributions on a real data set to conclude that those KPIs related to CPU usage provide the best cohesion and separation for clustering analysis and the good results of our visualization methodology.
LGJun 9, 2025
Realistic Urban Traffic Generator using Decentralized Federated Learning for the SUMO simulatorAlberto Bazán-Guillén, Carlos Beis-Penedo, Diego Cajaraville-Aboy et al.
Realistic urban traffic simulation is essential for sustainable urban planning and the development of intelligent transportation systems. However, generating high-fidelity, time-varying traffic profiles that accurately reflect real-world conditions, especially in large-scale scenarios, remains a major challenge. Existing methods often suffer from limitations in accuracy, scalability, or raise privacy concerns due to centralized data processing. This work introduces DesRUTGe (Decentralized Realistic Urban Traffic Generator), a novel framework that integrates Deep Reinforcement Learning (DRL) agents with the SUMO simulator to generate realistic 24-hour traffic patterns. A key innovation of DesRUTGe is its use of Decentralized Federated Learning (DFL), wherein each traffic detector and its corresponding urban zone function as an independent learning node. These nodes train local DRL models using minimal historical data and collaboratively refine their performance by exchanging model parameters with selected peers (e.g., geographically adjacent zones), without requiring a central coordinator. Evaluated using real-world data from the city of Barcelona, DesRUTGe outperforms standard SUMO-based tools such as RouteSampler, as well as other centralized learning approaches, by delivering more accurate and privacy-preserving traffic pattern generation.
LGMay 10, 2025
Privacy-aware Berrut Approximated Coded Computing applied to general distributed learningXavier Martínez-Luaña, Manuel Fernández-Veiga, Rebeca P. Díaz-Redondo et al.
Coded computing is one of the techniques that can be used for privacy protection in Federated Learning. However, most of the constructions used for coded computing work only under the assumption that the computations involved are exact, generally restricted to special classes of functions, and require quantized inputs. This paper considers the use of Private Berrut Approximate Coded Computing (PBACC) as a general solution to add strong but non-perfect privacy to federated learning. We derive new adapted PBACC algorithms for centralized aggregation, secure distributed training with centralized data, and secure decentralized training with decentralized data, thus enlarging significantly the applications of the method and the existing privacy protection tools available for these paradigms. Particularly, PBACC can be used robustly to attain privacy guarantees in decentralized federated learning for a variety of models. Our numerical results show that the achievable quality of different learning models (convolutional neural networks, variational autoencoders, and Cox regression) is minimally altered by using these new computing schemes, and that the privacy leakage can be bounded strictly to less than a fraction of one bit per participant. Additionally, the computational cost of the encoding and decoding processes depends only of the degree of decentralization of the data.
LGApr 2, 2025
CO-DEFEND: Continuous Decentralized Federated Learning for Secure DoH-Based Threat DetectionDiego Cajaraville-Aboy, Marta Moure-Garrido, Carlos Beis-Penedo et al.
The use of DNS over HTTPS (DoH) tunneling by an attacker to hide malicious activity within encrypted DNS traffic poses a serious threat to network security, as it allows malicious actors to bypass traditional monitoring and intrusion detection systems while evading detection by conventional traffic analysis techniques. Machine Learning (ML) techniques can be used to detect DoH tunnels; however, their effectiveness relies on large datasets containing both benign and malicious traffic. Sharing such datasets across entities is challenging due to privacy concerns. In this work, we propose CO-DEFEND (Continuous Decentralized Federated Learning for Secure DoH-Based Threat Detection), a Decentralized Federated Learning (DFL) framework that enables multiple entities to collaboratively train a classification machine learning model while preserving data privacy and enhancing resilience against single points of failure. The proposed DFL framework, which is scalable and privacy-preserving, is based on a federation process that allows multiple entities to train online their local models using incoming DoH flows in real time as they are processed by the entity. In addition, we adapt four classical machine learning algorithms, Support Vector Machines (SVM), Logistic Regression (LR), Decision Trees (DT), and Random Forest (RF), for federated scenarios, comparing their results with more computationally complex alternatives such as neural networks. We compare our proposed method by using the dataset CIRA-CIC-DoHBrw-2020 with existing machine learning approaches to demonstrate its effectiveness in detecting malicious DoH tunnels and the benefits it brings.
LGNov 14, 2024
Towards efficient compression and communication for prototype-based decentralized learningPablo Fernández-Piñeiro, Manuel Ferández-Veiga, Rebeca P. Díaz-Redondo et al.
In prototype-based federated learning, the exchange of model parameters between clients and the master server is replaced by transmission of prototypes or quantized versions of the data samples to the aggregation server. A fully decentralized deployment of prototype-based learning, without a central agregartor of prototypes, is more robust upon network failures and reacts faster to changes in the statistical distribution of the data, suggesting potential advantages and quick adaptation in dynamic learning tasks, e.g., when the data sources are IoT devices or when data is non-iid. In this paper, we consider the problem of designing a communication-efficient decentralized learning system based on prototypes. We address the challenge of prototype redundancy by leveraging on a twofold data compression technique, i.e., sending only update messages if the prototypes are informationtheoretically useful (via the Jensen-Shannon distance), and using clustering on the prototypes to compress the update messages used in the gossip protocol. We also use parallel instead of sequential gossiping, and present an analysis of its age-of-information (AoI). Our experimental results show that, with these improvements, the communications load can be substantially reduced without decreasing the convergence rate of the learning algorithm.