AIOct 30, 2023
Explainable Artificial Intelligence (XAI) 2.0: A Manifesto of Open Challenges and Interdisciplinary Research DirectionsLuca Longo, Mario Brcic, Federico Cabitza et al.
As systems based on opaque Artificial Intelligence (AI) continue to flourish in diverse real-world applications, understanding these black box models has become paramount. In response, Explainable AI (XAI) has emerged as a field of research with practical and ethical benefits across various domains. This paper not only highlights the advancements in XAI and its application in real-world scenarios but also addresses the ongoing challenges within XAI, emphasizing the need for broader perspectives and collaborative efforts. We bring together experts from diverse fields to identify open problems, striving to synchronize research agendas and accelerate XAI in practical applications. By fostering collaborative discussion and interdisciplinary cooperation, we aim to propel XAI forward, contributing to its continued success. Our goal is to put forward a comprehensive proposal for advancing XAI. To achieve this goal, we present a manifesto of 27 open problems categorized into nine categories. These challenges encapsulate the complexities and nuances of XAI and offer a road map for future research. For each problem, we provide promising research directions in the hope of harnessing the collective intelligence of interested stakeholders.
IVSep 5, 2022
Fuzzy Attention Neural Network to Tackle Discontinuity in Airway SegmentationYang Nan, Javier Del Ser, Zeyu Tang et al.
Airway segmentation is crucial for the examination, diagnosis, and prognosis of lung diseases, while its manual delineation is unduly burdensome. To alleviate this time-consuming and potentially subjective manual procedure, researchers have proposed methods to automatically segment airways from computerized tomography (CT) images. However, some small-sized airway branches (e.g., bronchus and terminal bronchioles) significantly aggravate the difficulty of automatic segmentation by machine learning models. In particular, the variance of voxel values and the severe data imbalance in airway branches make the computational module prone to discontinuous and false-negative predictions. especially for cohorts with different lung diseases. Attention mechanism has shown the capacity to segment complex structures, while fuzzy logic can reduce the uncertainty in feature representations. Therefore, the integration of deep attention networks and fuzzy theory, given by the fuzzy attention layer, should be an escalated solution for better generalization and robustness. This paper presents an efficient method for airway segmentation, comprising a novel fuzzy attention neural network and a comprehensive loss function to enhance the spatial continuity of airway segmentation. The deep fuzzy set is formulated by a set of voxels in the feature map and a learnable Gaussian membership function. Different from the existing attention mechanism, the proposed channel-specific fuzzy attention addresses the issue of heterogeneous features in different channels. Furthermore, a novel evaluation metric is proposed to assess both the continuity and completeness of airway structures. The efficiency, generalization and robustness of the proposed method have been proved by training on normal lung disease while testing on datasets of lung cancer, COVID-19 and pulmonary fibrosis.
LGApr 21, 2022
Handling Imbalanced Classification Problems With Support Vector Machines via Evolutionary Bilevel OptimizationAlejandro Rosales-Pérez, Salvador García, Francisco Herrera
Support vector machines (SVMs) are popular learning algorithms to deal with binary classification problems. They traditionally assume equal misclassification costs for each class; however, real-world problems may have an uneven class distribution. This article introduces EBCS-SVM: evolutionary bilevel cost-sensitive SVMs. EBCS-SVM handles imbalanced classification problems by simultaneously learning the support vectors and optimizing the SVM hyperparameters, which comprise the kernel parameter and misclassification costs. The resulting optimization problem is a bilevel problem, where the lower level determines the support vectors and the upper level the hyperparameters. This optimization problem is solved using an evolutionary algorithm (EA) at the upper level and sequential minimal optimization (SMO) at the lower level. These two methods work in a nested fashion, that is, the optimal support vectors help guide the search of the hyperparameters, and the lower level is initialized based on previous successful solutions. The proposed method is assessed using 70 datasets of imbalanced classification and compared with several state-of-the-art methods. The experimental results, supported by a Bayesian test, provided evidence of the effectiveness of EBCS-SVM when working with highly imbalanced datasets.
NEFeb 20, 2023
Multiobjective Evolutionary Pruning of Deep Neural Networks with Transfer Learning for improving their Performance and RobustnessJavier Poyatos, Daniel Molina, Aitor Martínez et al.
Evolutionary Computation algorithms have been used to solve optimization problems in relation with architectural, hyper-parameter or training configuration, forging the field known today as Neural Architecture Search. These algorithms have been combined with other techniques such as the pruning of Neural Networks, which reduces the complexity of the network, and the Transfer Learning, which lets the import of knowledge from another problem related to the one at hand. The usage of several criteria to evaluate the quality of the evolutionary proposals is also a common case, in which the performance and complexity of the network are the most used criteria. This work proposes MO-EvoPruneDeepTL, a multi-objective evolutionary pruning algorithm. MO-EvoPruneDeepTL uses Transfer Learning to adapt the last layers of Deep Neural Networks, by replacing them with sparse layers evolved by a genetic algorithm, which guides the evolution based in the performance, complexity and robustness of the network, being the robustness a great quality indicator for the evolved models. We carry out different experiments with several datasets to assess the benefits of our proposal. Results show that our proposal achieves promising results in all the objectives, and direct relation are presented among them. The experiments also show that the most influential neurons help us explain which parts of the input images are the most relevant for the prediction of the pruned neural network. Lastly, by virtue of the diversity within the Pareto front of pruning patterns produced by the proposal, it is shown that an ensemble of differently pruned models improves the overall performance and robustness of the trained networks.
AIJul 26, 2023
General Purpose Artificial Intelligence Systems (GPAIS): Properties, Definition, Taxonomy, Societal Implications and Responsible GovernanceIsaac Triguero, Daniel Molina, Javier Poyatos et al.
Most applications of Artificial Intelligence (AI) are designed for a confined and specific task. However, there are many scenarios that call for a more general AI, capable of solving a wide array of tasks without being specifically designed for them. The term General-Purpose Artificial Intelligence Systems (GPAIS) has been defined to refer to these AI systems. To date, the possibility of an Artificial General Intelligence, powerful enough to perform any intellectual task as if it were human, or even improve it, has remained an aspiration, fiction, and considered a risk for our society. Whilst we might still be far from achieving that, GPAIS is a reality and sitting at the forefront of AI research. This work discusses existing definitions for GPAIS and proposes a new definition that allows for a gradual differentiation among types of GPAIS according to their properties and limitations. We distinguish between closed-world and open-world GPAIS, characterising their degree of autonomy and ability based on several factors such as adaptation to new tasks, competence in domains not intentionally trained for, ability to learn from few data, or proactive acknowledgment of their own limitations. We propose a taxonomy of approaches to realise GPAIS, describing research trends such as the use of AI techniques to improve another AI (AI-powered AI) or (single) foundation models. As a prime example, we delve into GenAI, aligning them with the concepts presented in the taxonomy. We explore multi-modality, which involves fusing various types of data sources to expand the capabilities of GPAIS. Through the proposed definition and taxonomy, our aim is to facilitate research collaboration across different areas that are tackling general purpose tasks, as they share many common aspects. Finally, we discuss the state of GPAIS, prospects, societal implications, and the need for regulation and governance.
LGMay 20, 2022
Exploring the Trade-off between Plausibility, Change Intensity and Adversarial Power in Counterfactual Explanations using Multi-objective OptimizationJavier Del Ser, Alejandro Barredo-Arrieta, Natalia Díaz-Rodríguez et al.
There is a broad consensus on the importance of deep learning models in tasks involving complex data. Often, an adequate understanding of these models is required when focusing on the transparency of decisions in human-critical applications. Besides other explainability techniques, trustworthiness can be achieved by using counterfactuals, like the way a human becomes familiar with an unknown process: by understanding the hypothetical circumstances under which the output changes. In this work we argue that automated counterfactual generation should regard several aspects of the produced adversarial instances, not only their adversarial capability. To this end, we present a novel framework for the generation of counterfactual examples which formulates its goal as a multi-objective optimization problem balancing three different objectives: 1) plausibility, i.e., the likeliness of the counterfactual of being possible as per the distribution of the input data; 2) intensity of the changes to the original input; and 3) adversarial power, namely, the variability of the model's output induced by the counterfactual. The framework departs from a target model to be audited and uses a Generative Adversarial Network to model the distribution of input data, together with a multi-objective solver for the discovery of counterfactuals balancing among these objectives. The utility of the framework is showcased over six classification tasks comprising image and three-dimensional data. The experiments verify that the framework unveils counterfactuals that comply with intuition, increasing the trustworthiness of the user, and leading to further insights, such as the detection of bias and data misrepresentation.
AINov 11, 2022
REVEL Framework to measure Local Linear Explanations for black-box models: Deep Learning Image Classification case of studyIván Sevillano-García, Julián Luengo-Martín, Francisco Herrera
Explainable artificial intelligence is proposed to provide explanations for reasoning performed by an Artificial Intelligence. There is no consensus on how to evaluate the quality of these explanations, since even the definition of explanation itself is not clear in the literature. In particular, for the widely known Local Linear Explanations, there are qualitative proposals for the evaluation of explanations, although they suffer from theoretical inconsistencies. The case of image is even more problematic, where a visual explanation seems to explain a decision while detecting edges is what it really does. There are a large number of metrics in the literature specialized in quantitatively measuring different qualitative aspects so we should be able to develop metrics capable of measuring in a robust and correct way the desirable aspects of the explanations. In this paper, we propose a procedure called REVEL to evaluate different aspects concerning the quality of explanations with a theoretically coherent development. This procedure has several advances in the state of the art: it standardizes the concepts of explanation and develops a series of metrics not only to be able to compare between them but also to obtain absolute information regarding the explanation itself. The experiments have been carried out on image four datasets as benchmark where we show REVEL's descriptive and analytical power.
CVSep 25, 2024
Small data deep learning methodology for in-field disease detectionDavid Herrera-Poyato, Jacinto Domínguez-Rull, Rosana Montes et al.
Early detection of diseases in crops is essential to prevent harvest losses and improve the quality of the final product. In this context, the combination of machine learning and proximity sensors is emerging as a technique capable of achieving this detection efficiently and effectively. For example, this machine learning approach has been applied to potato crops -- to detect late blight (Phytophthora infestans) -- and grapevine crops -- to detect downy mildew. However, most of these AI models found in the specialised literature have been developed using leaf-by-leaf images taken in the lab, which does not represent field conditions and limits their applicability. In this study, we present the first machine learning model capable of detecting mild symptoms of late blight in potato crops through the analysis of high-resolution RGB images captured directly in the field, overcoming the limitations of other publications in the literature and presenting real-world applicability. Our proposal exploits the availability of high-resolution images via the concept of patching, and is based on deep convolutional neural networks with a focal loss function, which makes the model to focus on the complex patterns that arise in field conditions. Additionally, we present a data augmentation scheme that facilitates the training of these neural networks with few high-resolution images, which allows for development of models under the small data paradigm. Our model correctly detects all cases of late blight in the test dataset, demonstrating a high level of accuracy and effectiveness in identifying early symptoms. These promising results reinforce the potential use of machine learning for the early detection of diseases and pests in agriculture, enabling better treatment and reducing their impact on crops.
NEJun 7, 2022
TSFEDL: A Python Library for Time Series Spatio-Temporal Feature Extraction and Prediction using Deep Learning (with Appendices on Detailed Network Architectures and Experimental Cases of Study)Ignacio Aguilera-Martos, Ángel M. García-Vico, Julián Luengo et al.
The combination of convolutional and recurrent neural networks is a promising framework that allows the extraction of high-quality spatio-temporal features together with its temporal dependencies, which is key for time series prediction problems such as forecasting, classification or anomaly detection, amongst others. In this paper, the TSFEDL library is introduced. It compiles 20 state-of-the-art methods for both time series feature extraction and prediction, employing convolutional and recurrent deep neural networks for its use in several data mining tasks. The library is built upon a set of Tensorflow+Keras and PyTorch modules under the AGPLv3 license. The performance validation of the architectures included in this proposal confirms the usefulness of this Python package.
AINov 13, 2025
DenoGrad: Deep Gradient Denoising Framework for Enhancing the Performance of Interpretable AI ModelsJ. Javier Alonso-Ramos, Ignacio Aguilera-Martos, Andrés Herrera-Poyatos et al.
The performance of Machine Learning (ML) models, particularly those operating within the Interpretable Artificial Intelligence (Interpretable AI) framework, is significantly affected by the presence of noise in both training and production data. Denoising has therefore become a critical preprocessing step, typically categorized into instance removal and instance correction techniques. However, existing correction approaches often degrade performance or oversimplify the problem by altering the original data distribution. This leads to unrealistic scenarios and biased models, which is particularly problematic in contexts where interpretable AI models are employed, as their interpretability depends on the fidelity of the underlying data patterns. In this paper, we argue that defining noise independently of the solution may be ineffective, as its nature can vary significantly across tasks and datasets. Using a task-specific high quality solution as a reference can provide a more precise and adaptable noise definition. To this end, we propose DenoGrad, a novel Gradient-based instance Denoiser framework that leverages gradients from an accurate Deep Learning (DL) model trained on the target data -- regardless of the specific task -- to detect and adjust noisy samples. Unlike conventional approaches, DenoGrad dynamically corrects noisy instances, preserving problem's data distribution, and improving AI models robustness. DenoGrad is validated on both tabular and time series datasets under various noise settings against the state-of-the-art. DenoGrad outperforms existing denoising strategies, enhancing the performance of interpretable IA models while standing out as the only high quality approach that preserves the original data distribution.
CRFeb 25
Resilient Federated Chain: Transforming Blockchain Consensus into an Active Defense Layer for Federated LearningMario García-Márquez, Nuria Rodríguez-Barroso, M. Victoria Luzón et al.
Federated Learning (FL) has emerged as a key paradigm for building Trustworthy AI systems by enabling privacy-preserving, decentralized model training. However, FL is highly susceptible to adversarial attacks that compromise model integrity and data confidentiality, a vulnerability exacerbated by the fact that conventional data inspection methods are incompatible with its decentralized design. While integrating FL with Blockchain technology has been proposed to address some limitations, its potential for mitigating adversarial attacks remains largely unexplored. This paper introduces Resilient Federated Chain (RFC), a novel blockchain-enabled FL framework designed specifically to enhance resilience against such threats. RFC builds upon the existing Proof of Federated Learning architecture by repurposing the redundancy of its Pooled Mining mechanism as an active defense layer that can be combined with robust aggregation rules. Furthermore, the framework introduces a flexible evaluation function in its consensus mechanism, allowing for adaptive defense against different attack strategies. Extensive experimental evaluation on image classification tasks under various adversarial scenarios, demonstrates that RFC significantly improves robustness compared to baseline methods, providing a viable solution for securing decentralized learning environments.
LGJul 19, 2024
Fair Overlap Number of Balls (Fair-ONB): A Data-Morphology-based Undersampling Method for Bias ReductionJosé Daniel Pascual-Triana, Alberto Fernández, Paulo Novais et al.
One of the key issues regarding classification problems in Trustworthy Artificial Intelligence is ensuring Fairness in the prediction of different classes when protected (sensitive) features are present. Data quality is critical in these cases, as biases in training data can be reflected in machine learning, impacting human lives and failing to comply with current regulations. One strategy to improve data quality and avoid these problems is preprocessing the dataset. Instance selection via undersampling can foster balanced learning of classes and protected feature values. Performing undersampling in class overlap areas close to the decision boundary should bolster the impact on the classifier. This work proposes Fair Overlap Number of Balls (Fair-ONB), an undersampling method that harnesses the data morphology of the different data groups (obtained from the combination of classes and protected feature values) to perform guided undersampling in overlap areas. It employs attributes of the ball coverage of the groups, such as the radius, number of covered instances and density, to select the most suitable areas for undersampling and reduce bias. Results show that the Fair-ONB method improves model Fairness with low impact on the classifier's predictive performance.
LGFeb 6
Featured Reproducing Kernel Banach Spaces for Learning and Neural NetworksIsabel de la Higuera, Francisco Herrera, M. Victoria Velasco
Reproducing kernel Hilbert spaces provide a foundational framework for kernel-based learning, where regularization and interpolation problems admit finite-dimensional solutions through classical representer theorems. Many modern learning models, however -- including fixed-architecture neural networks equipped with non-quadratic norms -- naturally give rise to non-Hilbertian geometries that fall outside this setting. In Banach spaces, continuity of point-evaluation functionals alone is insufficient to guarantee feature representations or kernel-based learning formulations. In this work, we develop a functional-analytic framework for learning in Banach spaces based on the notion of featured reproducing kernel Banach spaces. We identify the precise structural conditions under which feature maps, kernel constructions, and representer-type results can be recovered beyond the Hilbertian regime. Within this framework, supervised learning is formulated as a minimal-norm interpolation or regularization problem, and existence results together with conditional representer theorems are established. We further extend the theory to vector-valued featured reproducing kernel Banach spaces and show that fixed-architecture neural networks naturally induce special instances of such spaces. This provides a unified function-space perspective on kernel methods and neural networks and clarifies when kernel-based learning principles extend beyond reproducing kernel Hilbert spaces.
CVJan 26
SeNeDiF-OOD: Semantic Nested Dichotomy Fusion for Out-of-Distribution Detection Methodology in Open-World Classification. A Case Study on Monument Style ClassificationIgnacio Antequera-Sánchez, Juan Luis Suárez-Díaz, Rosana Montes et al.
Out-of-distribution (OOD) detection is a fundamental requirement for the reliable deployment of artificial intelligence applications in open-world environments. However, addressing the heterogeneous nature of OOD data, ranging from low-level corruption to semantic shifts, remains a complex challenge that single-stage detectors often fail to resolve. To address this issue, we propose SeNeDiF-OOD, a novel methodology based on Semantic Nested Dichotomy Fusion. This framework decomposes the detection task into a hierarchical structure of binary fusion nodes, where each layer is designed to integrate decision boundaries aligned with specific levels of semantic abstraction. To validate the proposed framework, we present a comprehensive case study using MonuMAI, a real-world architectural style recognition system exposed to an open environment. This application faces a diverse range of inputs, including non-monument images, unknown architectural styles, and adversarial attacks, making it an ideal testbed for our proposal. Through extensive experimental evaluation in this domain, results demonstrate that our hierarchical fusion methodology significantly outperforms traditional baselines, effectively filtering these diverse OOD categories while preserving in-distribution performance.
CYFeb 7, 2024
Teranga Go!: Carpooling Collaborative Consumption Community with multi-criteria hesitant fuzzy linguistic term set opinions to build confidence and trustRosana Montes, Ana M. Sanchez, Pedro Villar et al.
Classic Delphi and Fuzzy Delphi methods are used to test content validity of a data collection tools such as questionnaires. Fuzzy Delphi takes the opinion issued by judges from a linguistic perspective reducing ambiguity in opinions by using fuzzy numbers. We propose an extension named 2-Tuple Fuzzy Linguistic Delphi method to deal with scenarios in which judges show different expertise degrees by using fuzzy multigranular semantics of the linguistic terms and to obtain intermediate and final results expressed by 2-tuple linguistic values. The key idea of our proposal is to validate the full questionnaire by means of the evaluation of its parts, defining the validity of each item as a Decision Making problem. Taking the opinion of experts, we measure the degree of consensus, the degree of consistency, and the linguistic score of each item, in order to detect those items that affect, positively or negatively, the quality of the instrument. Considering the real need to evaluate a b-learning educational experience with a consensual questionnaire, we present a Decision Making model for questionnaire validation that solve it. Additionally, we contribute to this consensus reaching problem by developing an online tool under GPL v3 license. The software visualizes the collective valuations for each iteration and assists to determine which parts of the questionnaire should be modified to reach a consensual solution.
CYFeb 4, 2025
Responsible Artificial Intelligence Systems: A Roadmap to Society's Trust through Trustworthy AI, Auditability, Accountability, and GovernanceAndrés Herrera-Poyatos, Javier Del Ser, Marcos López de Prado et al.
Artificial intelligence (AI) has matured as a technology, necessitating the development of responsibility frameworks that are fair, inclusive, trustworthy, safe and secure, transparent, and accountable. By establishing such frameworks, we can harness the full potential of AI while mitigating its risks, particularly in high-risk scenarios. This requires the design of responsible AI systems based on trustworthy AI technologies and ethical principles, with the aim of ensuring auditability and accountability throughout their design, development, and deployment, adhering to domain-specific regulations and standards. This paper explores the concept of a responsible AI system from a holistic perspective, which encompasses four key dimensions: 1) regulatory context; 2) trustworthy AI technology along with standardization and assessments; 3) auditability and accountability; and 4) AI governance. The aim of this paper is double. First, we analyze and understand these four dimensions and their interconnections in the form of an analysis and overview. Second, the final goal of the paper is to propose a roadmap in the design of responsible AI systems, ensuring that they can gain society's trust. To achieve this trustworthiness, this paper also fosters interdisciplinary discussions on the ethical, legal, social, economic, and cultural aspects of AI from a global governance perspective. Last but not least, we also reflect on the current state and those aspects that need to be developed in the near future, as ten lessons learned.
CLApr 6, 2025
An overview of model uncertainty and variability in LLM-based sentiment analysis. Challenges, mitigation strategies and the role of explainabilityDavid Herrera-Poyatos, Carlos Peláez-González, Cristina Zuheros et al.
Large Language Models (LLMs) have significantly advanced sentiment analysis, yet their inherent uncertainty and variability pose critical challenges to achieving reliable and consistent outcomes. This paper systematically explores the Model Variability Problem (MVP) in LLM-based sentiment analysis, characterized by inconsistent sentiment classification, polarization, and uncertainty arising from stochastic inference mechanisms, prompt sensitivity, and biases in training data. We analyze the core causes of MVP, presenting illustrative examples and a case study to highlight its impact. In addition, we investigate key challenges and mitigation strategies, paying particular attention to the role of temperature as a driver of output randomness and emphasizing the crucial role of explainability in improving transparency and user trust. By providing a structured perspective on stability, reproducibility, and trustworthiness, this study helps develop more reliable, explainable, and robust sentiment analysis models, facilitating their deployment in high-stakes domains such as finance, healthcare, and policymaking, among others.
CRApr 9, 2024
FLEX: FLEXible Federated Learning FrameworkFrancisco Herrera, Daniel Jiménez-López, Alberto Argente-Garrido et al.
In the realm of Artificial Intelligence (AI), the need for privacy and security in data processing has become paramount. As AI applications continue to expand, the collection and handling of sensitive data raise concerns about individual privacy protection. Federated Learning (FL) emerges as a promising solution to address these challenges by enabling decentralized model training on local devices, thus preserving data privacy. This paper introduces FLEX: a FLEXible Federated Learning Framework designed to provide maximum flexibility in FL research experiments. By offering customizable features for data distribution, privacy parameters, and communication strategies, FLEX empowers researchers to innovate and develop novel FL techniques. The framework also includes libraries for specific FL implementations including: (1) anomalies, (2) blockchain, (3) adversarial attacks and defences, (4) natural language processing and (5) decision trees, enhancing its versatility and applicability in various domains. Overall, FLEX represents a significant advancement in FL research, facilitating the development of robust and efficient FL applications.
LGApr 3, 2024
An Interpretable Client Decision Tree Aggregation process for Federated LearningAlberto Argente-Garrido, Cristina Zuheros, M. Victoria Luzón et al.
Trustworthy Artificial Intelligence solutions are essential in today's data-driven applications, prioritizing principles such as robustness, safety, transparency, explainability, and privacy among others. This has led to the emergence of Federated Learning as a solution for privacy and distributed machine learning. While decision trees, as self-explanatory models, are ideal for collaborative model training across multiple devices in resource-constrained environments such as federated learning environments for injecting interpretability in these models. Decision tree structure makes the aggregation in a federated learning environment not trivial. They require techniques that can merge their decision paths without introducing bias or overfitting while keeping the aggregated decision trees robust and generalizable. In this paper, we propose an Interpretable Client Decision Tree Aggregation process for Federated Learning scenarios that keeps the interpretability and the precision of the base decision trees used for the aggregation. This model is based on aggregating multiple decision paths of the decision trees and can be used on different decision tree types, such as ID3 and CART. We carry out the experiments within four datasets, and the analysis shows that the tree built with the model improves the local models, and outperforms the state-of-the-art.
AIMar 22, 2024
Large language models for crowd decision making based on prompt design strategies using ChatGPT: models, analysis and challengesDavid Herrera-Poyatos, Cristina Zuheros, Rosana Montes et al.
Social Media and Internet have the potential to be exploited as a source of opinion to enrich Decision Making solutions. Crowd Decision Making (CDM) is a methodology able to infer opinions and decisions from plain texts, such as reviews published in social media platforms, by means of Sentiment Analysis. Currently, the emergence and potential of Large Language Models (LLMs) lead us to explore new scenarios of automatically understand written texts, also known as natural language processing. This paper analyzes the use of ChatGPT based on prompt design strategies to assist in CDM processes to extract opinions and make decisions. We integrate ChatGPT in CDM processes as a flexible tool that infer the opinions expressed in texts, providing numerical or linguistic evaluations where the decision making models are based on the prompt design strategies. We include a multi-criteria decision making scenario with a category ontology for criteria. We also consider ChatGPT as an end-to-end CDM model able to provide a general opinion and score on the alternatives. We conduct empirical experiments on real data extracted from TripAdvisor, the TripR-2020Large dataset. The analysis of results show a promising branch for developing quality decision making models using ChatGPT. Finally, we discuss the challenges of consistency, sensitivity and explainability associated to the use of LLMs in CDM processes, raising open questions for future studies.
LGFeb 10, 2025
Krum Federated Chain (KFC): Using blockchain to defend against adversarial attacks in Federated LearningMario García-Márquez, Nuria Rodríguez-Barroso, M. Victoria Luzón et al.
Federated Learning presents a nascent approach to machine learning, enabling collaborative model training across decentralized devices while safeguarding data privacy. However, its distributed nature renders it susceptible to adversarial attacks. Integrating blockchain technology with Federated Learning offers a promising avenue to enhance security and integrity. In this paper, we tackle the potential of blockchain in defending Federated Learning against adversarial attacks. First, we test Proof of Federated Learning, a well known consensus mechanism designed ad-hoc to federated contexts, as a defense mechanism demonstrating its efficacy against Byzantine and backdoor attacks when at least one miner remains uncompromised. Second, we propose Krum Federated Chain, a novel defense strategy combining Krum and Proof of Federated Learning, valid to defend against any configuration of Byzantine or backdoor attacks, even when all miners are compromised. Our experiments conducted on image classification datasets validate the effectiveness of our proposed approaches.
NEJan 13, 2025
The Paradox of Success in Evolutionary and Bioinspired Optimization: Revisiting Critical Issues, Key Studies, and Methodological PathwaysDaniel Molina, Javier Del Ser, Javier Poyatos et al.
Evolutionary and bioinspired computation are crucial for efficiently addressing complex optimization problems across diverse application domains. By mimicking processes observed in nature, like evolution itself, these algorithms offer innovative solutions beyond the reach of traditional optimization methods. They excel at finding near-optimal solutions in large, complex search spaces, making them invaluable in numerous fields. However, both areas are plagued by challenges at their core, including inadequate benchmarking, problem-specific overfitting, insufficient theoretical grounding, and superfluous proposals justified only by their biological metaphor. This overview recapitulates and analyzes in depth the criticisms concerning the lack of innovation and rigor in experimental studies within the field. To this end, we examine the judgmental positions of the existing literature in an informed attempt to guide the research community toward directions of solid contribution and advancement in these areas. We summarize guidelines for the design of evolutionary and bioinspired optimizers, the development of experimental comparisons, and the derivation of novel proposals that take a step further in the field. We provide a brief note on automating the process of creating these algorithms, which may help align metaheuristic optimization research with its primary objective (solving real-world problems), provided that our identified pathways are followed. Our conclusions underscore the need for a sustained push towards innovation and the enforcement of methodological rigor in prospective studies to fully realize the potential of these advanced computational techniques.
AIJul 21, 2025
Challenges of Trustworthy Federated Learning: What's Done, Current Trends and Remaining WorkNuria Rodríguez-Barroso, Mario García-Márquez, M. Victoria Luzón et al.
In recent years, the development of Trustworthy Artificial Intelligence (TAI) has emerged as a critical objective in the deployment of AI systems across sensitive and high-risk domains. TAI frameworks articulate a comprehensive set of ethical, legal, and technical requirements to ensure that AI technologies are aligned with human values, rights, and societal expectations. Among the various AI paradigms, Federated Learning (FL) presents a promising solution to pressing privacy concerns. However, aligning FL with the rest of the requirements of TAI presents a series of challenges, most of which arise from its inherently distributed nature. In this work, we adopt the requirements TAI as a guiding structure to systematically analyze the challenges of adapting FL to TAI. Specifically, we classify and examine the key obstacles to aligning FL with TAI, providing a detailed exploration of what has been done, the trends, and the remaining work within each of the identified challenges.
AIApr 3, 2024
X-SHIELD: Regularization for eXplainable Artificial IntelligenceIván Sevillano-García, Julián Luengo, Francisco Herrera
As artificial intelligence systems become integral across domains, the demand for explainability grows, the called eXplainable artificial intelligence (XAI). Existing efforts primarily focus on generating and evaluating explanations for black-box models while a critical gap in directly enhancing models remains through these evaluations. It is important to consider the potential of this explanation process to improve model quality with a feedback on training as well. XAI may be used to improve model performance while boosting its explainability. Under this view, this paper introduces Transformation - Selective Hidden Input Evaluation for Learning Dynamics (T-SHIELD), a regularization family designed to improve model quality by hiding features of input, forcing the model to generalize without those features. Within this family, we propose the XAI - SHIELD(X-SHIELD), a regularization for explainable artificial intelligence, which uses explanations to select specific features to hide. In contrast to conventional approaches, X-SHIELD regularization seamlessly integrates into the objective function enhancing model explainability while also improving performance. Experimental validation on benchmark datasets underscores X-SHIELD's effectiveness in improving performance and overall explainability. The improvement is validated through experiments comparing models with and without the X-SHIELD regularization, with further analysis exploring the rationale behind its design choices. This establishes X-SHIELD regularization as a promising pathway for developing reliable artificial intelligence regularization.
LGMay 20, 2024
Overlap Number of Balls Model-Agnostic CounterFactuals (ONB-MACF): A Data-Morphology-based Counterfactual Generation Method for Trustworthy Artificial IntelligenceJosé Daniel Pascual-Triana, Alberto Fernández, Javier Del Ser et al.
Explainable Artificial Intelligence (XAI) is a pivotal research domain aimed at understanding the operational mechanisms of AI systems, particularly those considered ``black boxes'' due to their complex, opaque nature. XAI seeks to make these AI systems more understandable and trustworthy, providing insight into their decision-making processes. By producing clear and comprehensible explanations, XAI enables users, practitioners, and stakeholders to trust a model's decisions. This work analyses the value of data morphology strategies in generating counterfactual explanations. It introduces the Overlap Number of Balls Model-Agnostic CounterFactuals (ONB-MACF) method, a model-agnostic counterfactual generator that leverages data morphology to estimate a model's decision boundaries. The ONB-MACF method constructs hyperspheres in the data space whose covered points share a class, mapping the decision boundary. Counterfactuals are then generated by incrementally adjusting an instance's attributes towards the nearest alternate-class hypersphere, crossing the decision boundary with minimal modifications. By design, the ONB-MACF method generates feasible and sparse counterfactuals that follow the data distribution. Our comprehensive benchmark from a double perspective (quantitative and qualitative) shows that the ONB-MACF method outperforms existing state-of-the-art counterfactual generation methods across multiple quality metrics on diverse tabular datasets. This supports our hypothesis, showcasing the potential of data-morphology-based explainability strategies for trustworthy AI.
CLApr 7, 2025
A Domain-Based Taxonomy of Jailbreak Vulnerabilities in Large Language ModelsCarlos Peláez-González, Andrés Herrera-Poyatos, Cristina Zuheros et al.
The study of large language models (LLMs) is a key area in open-world machine learning. Although LLMs demonstrate remarkable natural language processing capabilities, they also face several challenges, including consistency issues, hallucinations, and jailbreak vulnerabilities. Jailbreaking refers to the crafting of prompts that bypass alignment safeguards, leading to unsafe outputs that compromise the integrity of LLMs. This work specifically focuses on the challenge of jailbreak vulnerabilities and introduces a novel taxonomy of jailbreak attacks grounded in the training domains of LLMs. It characterizes alignment failures through generalization, objectives, and robustness gaps. Our primary contribution is a perspective on jailbreak, framed through the different linguistic domains that emerge during LLM training and alignment. This viewpoint highlights the limitations of existing approaches and enables us to classify jailbreak attacks on the basis of the underlying model deficiencies they exploit. Unlike conventional classifications that categorize attacks based on prompt construction methods (e.g., prompt templating), our approach provides a deeper understanding of LLM behavior. We introduce a taxonomy with four categories -- mismatched generalization, competing objectives, adversarial robustness, and mixed attacks -- offering insights into the fundamental nature of jailbreak vulnerabilities. Finally, we present key lessons derived from this taxonomic study.
LGApr 3, 2025
STOOD-X methodology: using statistical nonparametric test for OOD Detection Large-Scale datasets enhanced with explainabilityIván Sevillano-García, Julián Luengo, Francisco Herrera
Out-of-Distribution (OOD) detection is a critical task in machine learning, particularly in safety-sensitive applications where model failures can have serious consequences. However, current OOD detection methods often suffer from restrictive distributional assumptions, limited scalability, and a lack of interpretability. To address these challenges, we propose STOOD-X, a two-stage methodology that combines a Statistical nonparametric Test for OOD Detection with eXplainability enhancements. In the first stage, STOOD-X uses feature-space distances and a Wilcoxon-Mann-Whitney test to identify OOD samples without assuming a specific feature distribution. In the second stage, it generates user-friendly, concept-based visual explanations that reveal the features driving each decision, aligning with the BLUE XAI paradigm. Through extensive experiments on benchmark datasets and multiple architectures, STOOD-X achieves competitive performance against state-of-the-art post hoc OOD detectors, particularly in high-dimensional and complex settings. In addition, its explainability framework enables human oversight, bias detection, and model debugging, fostering trust and collaboration between humans and AI systems. The STOOD-X methodology therefore offers a robust, explainable, and scalable solution for real-world OOD detection tasks.
LGSep 26, 2025
Semantic-Inductive Attribute Selection for Zero-Shot LearningJuan Jose Herrera-Aranda, Guillermo Gomez-Trenado, Francisco Herrera et al.
Zero-Shot Learning is an important paradigm within General-Purpose Artificial Intelligence Systems, particularly in those that operate in open-world scenarios where systems must adapt to new tasks dynamically. Semantic spaces play a pivotal role as they bridge seen and unseen classes, but whether human-annotated or generated by a machine learning model, they often contain noisy, redundant, or irrelevant attributes that hinder performance. To address this, we introduce a partitioning scheme that simulates unseen conditions in an inductive setting (which is the most challenging), allowing attribute relevance to be assessed without access to semantic information from unseen classes. Within this framework, we study two complementary feature-selection strategies and assess their generalisation. The first adapts embedded feature selection to the particular demands of ZSL, turning model-driven rankings into meaningful semantic pruning; the second leverages evolutionary computation to directly explore the space of attribute subsets more broadly. Experiments on five benchmark datasets (AWA2, CUB, SUN, aPY, FLO) show that both methods consistently improve accuracy on unseen classes by reducing redundancy, but in complementary ways: RFS is efficient and competitive though dependent on critical hyperparameters, whereas GA is more costly yet explores the search space more broadly and avoids such dependence. These results confirm that semantic spaces are inherently redundant and highlight the proposed partitioning scheme as an effective tool to refine them under inductive conditions.
LGSep 6, 2025
DCV-ROOD Evaluation Framework: Dual Cross-Validation for Robust Out-of-Distribution DetectionArantxa Urrea-Castaño, Nicolás Segura-Kunsagi, Juan Luis Suárez-Díaz et al.
Out-of-distribution (OOD) detection plays a key role in enhancing the robustness of artificial intelligence systems by identifying inputs that differ significantly from the training distribution, thereby preventing unreliable predictions and enabling appropriate fallback mechanisms. Developing reliable OOD detection methods is a significant challenge, and rigorous evaluation of these techniques is essential for ensuring their effectiveness, as it allows researchers to assess their performance under diverse conditions and to identify potential limitations or failure modes. Cross-validation (CV) has proven to be a highly effective tool for providing a reasonable estimate of the performance of a learning algorithm. Although OOD scenarios exhibit particular characteristics, an appropriate adaptation of CV can lead to a suitable evaluation framework for this setting. This work proposes a dual CV framework for robust evaluation of OOD detection models, aimed at improving the reliability of their assessment. The proposed evaluation framework aims to effectively integrate in-distribution (ID) and OOD data while accounting for their differing characteristics. To achieve this, ID data are partitioned using a conventional approach, whereas OOD data are divided by grouping samples based on their classes. Furthermore, we analyze the context of data with class hierarchy to propose a data splitting that considers the entire class hierarchy to obtain fair ID-OOD partitions to apply the proposed evaluation framework. This framework is called Dual Cross-Validation for Robust Out-of-Distribution Detection (DCV-ROOD). To test the validity of the evaluation framework, we selected a set of state-of-the-art OOD detection methods, both with and without outlier exposure. The results show that the method achieves very fast convergence to the true performance.
CLSep 5, 2025
Triadic Fusion of Cognitive, Functional, and Causal Dimensions for Explainable LLMs: The TAXAL FrameworkDavid Herrera-Poyatos, Carlos Peláez-González, Cristina Zuheros et al.
Large Language Models (LLMs) are increasingly being deployed in high-risk domains where opacity, bias, and instability undermine trust and accountability. Traditional explainability methods, focused on surface outputs, do not capture the reasoning pathways, planning logic, and systemic impacts of agentic LLMs. We introduce TAXAL (Triadic Alignment for eXplainability in Agentic LLMs), a triadic fusion framework that unites three complementary dimensions: cognitive (user understanding), functional (practical utility), and causal (faithful reasoning). TAXAL provides a unified, role-sensitive foundation for designing, evaluating, and deploying explanations in diverse sociotechnical settings. Our analysis synthesizes existing methods, ranging from post-hoc attribution and dialogic interfaces to explanation-aware prompting, and situates them within the TAXAL triadic fusion model. We further demonstrate its applicability through case studies in law, education, healthcare, and public services, showing how explanation strategies adapt to institutional constraints and stakeholder roles. By combining conceptual clarity with design patterns and deployment pathways, TAXAL advances explainability as a technical and sociotechnical practice, supporting trustworthy and context-sensitive LLM applications in the era of agentic AI.
LGMay 27, 2025
Addressing Data Quality Decompensation in Federated Learning via Dynamic Client SelectionQinjun Fei, Nuria Rodríguez-Barroso, María Victoria Luzón et al.
In cross-silo Federated Learning (FL), client selection is critical to ensure high model performance, yet it remains challenging due to data quality decompensation, budget constraints, and incentive compatibility. As training progresses, these factors exacerbate client heterogeneity and degrade global performance. Most existing approaches treat these challenges in isolation, making jointly optimizing multiple factors difficult. To address this, we propose Shapley-Bid Reputation Optimized Federated Learning (SBRO-FL), a unified framework integrating dynamic bidding, reputation modeling, and cost-aware selection. Clients submit bids based on their perceived data quality, and their contributions are evaluated using Shapley values to quantify their marginal impact on the global model. A reputation system, inspired by prospect theory, captures historical performance while penalizing inconsistency. The client selection problem is formulated as a 0-1 integer program that maximizes reputation-weighted utility under budget constraints. Experiments on FashionMNIST, EMNIST, CIFAR-10, and SVHN datasets show that SBRO-FL improves accuracy, convergence speed, and robustness, even in adversarial and low-bid interference scenarios. Our results highlight the importance of balancing data reliability, incentive compatibility, and cost efficiency to enable scalable and trustworthy FL deployments.
CYMay 19, 2025
Aligning Trustworthy AI with Democracy: A Dual Taxonomy of Opportunities and RisksOier Mentxaka, Natalia Díaz-Rodríguez, Mark Coeckelbergh et al.
Artificial Intelligence (AI) poses both significant risks and valuable opportunities for democratic governance. This paper introduces a dual taxonomy to evaluate AI's complex relationship with democracy: the AI Risks to Democracy (AIRD) taxonomy, which identifies how AI can undermine core democratic principles such as autonomy, fairness, and trust; and the AI's Positive Contributions to Democracy (AIPD) taxonomy, which highlights AI's potential to enhance transparency, participation, efficiency, and evidence-based policymaking. Grounded in the European Union's approach to ethical AI governance, and particularly the seven Trustworthy AI requirements proposed by the European Commission's High-Level Expert Group on AI, each identified risk is aligned with mitigation strategies based on EU regulatory and normative frameworks. Our analysis underscores the transversal importance of transparency and societal well-being across all risk categories and offers a structured lens for aligning AI systems with democratic values. By integrating democratic theory with practical governance tools, this paper offers a normative and actionable framework to guide research, regulation, and institutional design to support trustworthy, democratic AI. It provides scholars with a conceptual foundation to evaluate the democratic implications of AI, equips policymakers with structured criteria for ethical oversight, and helps technologists align system design with democratic principles. In doing so, it bridges the gap between ethical aspirations and operational realities, laying the groundwork for more inclusive, accountable, and resilient democratic systems in the algorithmic age.
LGApr 25, 2025
Data Science: a Natural EcosystemEmilio Porcu, Roy El Moukari, Laurent Najman et al.
This manuscript provides a holistic (data-centric) view of what we term essential data science, as a natural ecosystem with challenges and missions stemming from the data universe with its multiple combinations of the 5D complexities (data structure, domain, cardinality, causality, and ethics) with the phases of the data life cycle. Data agents perform tasks driven by specific goals. The data scientist is an abstract entity that comes from the logical organization of data agents with their actions. Data scientists face challenges that are defined according to the missions. We define specific discipline-induced data science, which in turn allows for the definition of pan-data science, a natural ecosystem that integrates specific disciplines with the essential data science. We semantically split the essential data science into computational, and foundational. We claim that there is a serious threat of divergence between computational and foundational data science. Especially, if no approach is taken to rate whether a data universe discovery should be useful or not. We suggest that rigorous approaches to measure the usefulness of data universe discoveries might mitigate such a divergence.
LGMar 27, 2025
Improving $(α, f)$-Byzantine Resilience in Federated Learning via layerwise aggregation and cosine distanceMario García-Márquez, Nuria Rodríguez-Barroso, M. Victoria Luzón et al.
The rapid development of artificial intelligence systems has amplified societal concerns regarding their usage, necessitating regulatory frameworks that encompass data privacy. Federated Learning (FL) is posed as potential solution to data privacy challenges in distributed machine learning by enabling collaborative model training {without data sharing}. However, FL systems remain vulnerable to Byzantine attacks, where malicious nodes contribute corrupted model updates. While Byzantine Resilient operators have emerged as a widely adopted robust aggregation algorithm to mitigate these attacks, its efficacy diminishes significantly in high-dimensional parameter spaces, sometimes leading to poor performing models. This paper introduces Layerwise Cosine Aggregation, a novel aggregation scheme designed to enhance robustness of these rules in such high-dimensional settings while preserving computational efficiency. A theoretical analysis is presented, demonstrating the superior robustness of the proposed Layerwise Cosine Aggregation compared to original robust aggregation operators. Empirical evaluation across diverse image classification datasets, under varying data distributions and Byzantine attack scenarios, consistently demonstrates the improved performance of Layerwise Cosine Aggregation, achieving up to a 16% increase in model accuracy.
CRMar 12, 2025
Membership Inference Attacks fueled by Few-Short Learning to detect privacy leakage tackling data integrityDaniel Jiménez-López, Nuria Rodríguez-Barroso, M. Victoria Luzón et al.
Deep learning models have an intrinsic privacy issue as they memorize parts of their training data, creating a privacy leakage. Membership Inference Attacks (MIA) exploit it to obtain confidential information about the data used for training, aiming to steal information. They can be repurposed as a measurement of data integrity by inferring whether it was used to train a machine learning model. While state-of-the-art attacks achieve a significant privacy leakage, their requirements are not feasible enough, hindering their role as practical tools to assess the magnitude of the privacy risk. Moreover, the most appropriate evaluation metric of MIA, the True Positive Rate at low False Positive Rate lacks interpretability. We claim that the incorporation of Few-Shot Learning techniques to the MIA field and a proper qualitative and quantitative privacy evaluation measure should deal with these issues. In this context, our proposal is twofold. We propose a Few-Shot learning based MIA, coined as the FeS-MIA model, which eases the evaluation of the privacy breach of a deep learning model by significantly reducing the number of resources required for the purpose. Furthermore, we propose an interpretable quantitative and qualitative measure of privacy, referred to as Log-MIA measure. Jointly, these proposals provide new tools to assess the privacy leakage and to ease the evaluation of the training data integrity of deep learning models, that is, to analyze the privacy breach of a deep learning model. Experiments carried out with MIA over image classification and language modeling tasks and its comparison to the state-of-the-art show that our proposals excel at reporting the privacy leakage of a deep learning model with little extra information.
CVJun 17, 2024
Deep Learning methodology for the identification of wood species using high-resolution macroscopic imagesDavid Herrera-Poyatos, Andrés Herrera-Poyatos, Rosana Montes et al.
Significant advancements in the field of wood species identification are needed worldwide to support sustainable timber trade. In this work we contribute to automate the identification of wood species via high-resolution macroscopic images of timber. The main challenge of this problem is that fine-grained patterns in timber are crucial in order to accurately identify wood species, and these patterns are not properly learned by traditional convolutional neural networks (CNNs) trained on low/medium resolution images. We propose a Timber Deep Learning Identification with Patch-based Inference Voting methodology, abbreviated TDLI-PIV methodology. Our proposal exploits the concept of patching and the availability of high-resolution macroscopic images of timber in order to overcome the inherent challenges that CNNs face in timber identification. The TDLI-PIV methodology is able to capture fine-grained patterns in timber and, moreover, boosts robustness and prediction accuracy via a collaborative voting inference process. In this work we also introduce a new data set of marcroscopic images of timber, called GOIMAI-Phase-I, which has been obtained using optical magnification in order to capture fine-grained details, which contrasts to the other datasets that are publicly available. More concretely, images in GOIMAI-Phase-I are taken with a smartphone with a 24x magnifying lens attached to the camera. Our data set contains 2120 images of timber and covers 37 legally protected wood species. Our experiments have assessed the performance of the TDLI-PIV methodology, involving the comparison with other methodologies available in the literature, exploration of data augmentation methods and the effect that the dataset size has on the accuracy of TDLI-PIV.
NEJun 3, 2024
Evolutionary Computation for the Design and Enrichment of General-Purpose Artificial Intelligence Systems: Survey and ProspectsJavier Poyatos, Javier Del Ser, Salvador Garcia et al.
In Artificial Intelligence, there is an increasing demand for adaptive models capable of dealing with a diverse spectrum of learning tasks, surpassing the limitations of systems devised to cope with a single task. The recent emergence of General-Purpose Artificial Intelligence Systems (GPAIS) poses model configuration and adaptability challenges at far greater complexity scales than the optimal design of traditional Machine Learning models. Evolutionary Computation (EC) has been a useful tool for both the design and optimization of Machine Learning models, endowing them with the capability to configure and/or adapt themselves to the task under consideration. Therefore, their application to GPAIS is a natural choice. This paper aims to analyze the role of EC in the field of GPAIS, exploring the use of EC for their design or enrichment. We also match GPAIS properties to Machine Learning areas in which EC has had a notable contribution, highlighting recent milestones of EC for GPAIS. Furthermore, we discuss the challenges of harnessing the benefits of EC for GPAIS, presenting different strategies to both design and improve GPAIS with EC, covering tangential areas, identifying research niches, and outlining potential research directions for EC and GPAIS.
CYMay 2, 2023
Connecting the Dots in Trustworthy Artificial Intelligence: From AI Principles, Ethics, and Key Requirements to Responsible AI Systems and RegulationNatalia Díaz-Rodríguez, Javier Del Ser, Mark Coeckelbergh et al.
Trustworthy Artificial Intelligence (AI) is based on seven technical requirements sustained over three main pillars that should be met throughout the system's entire life cycle: it should be (1) lawful, (2) ethical, and (3) robust, both from a technical and a social perspective. However, attaining truly trustworthy AI concerns a wider vision that comprises the trustworthiness of all processes and actors that are part of the system's life cycle, and considers previous aspects from different lenses. A more holistic vision contemplates four essential axes: the global principles for ethical use and development of AI-based systems, a philosophical take on AI ethics, a risk-based approach to AI regulation, and the mentioned pillars and requirements. The seven requirements (human agency and oversight; robustness and safety; privacy and data governance; transparency; diversity, non-discrimination and fairness; societal and environmental wellbeing; and accountability) are analyzed from a triple perspective: What each requirement for trustworthy AI is, Why it is needed, and How each requirement can be implemented in practice. On the other hand, a practical approach to implement trustworthy AI systems allows defining the concept of responsibility of AI-based systems facing the law, through a given auditing process. Therefore, a responsible AI system is the resulting notion we introduce in this work, and a concept of utmost necessity that can be realized through auditing processes, subject to the challenges posed by the use of regulatory sandboxes. Our multidisciplinary vision of trustworthy AI culminates in a debate on the diverging views published lately about the future of AI. Our reflections in this matter conclude that regulation is a key for reaching a consensus among these views, and that trustworthy and responsible AI systems will be crucial for the present and future of our society.
NEFeb 8, 2022
EvoPruneDeepTL: An Evolutionary Pruning Model for Transfer Learning based Deep Neural NetworksJavier Poyatos, Daniel Molina, Aritz. D. Martinez et al.
In recent years, Deep Learning models have shown a great performance in complex optimization problems. They generally require large training datasets, which is a limitation in most practical cases. Transfer learning allows importing the first layers of a pre-trained architecture and connecting them to fully-connected layers to adapt them to a new problem. Consequently, the configuration of the these layers becomes crucial for the performance of the model. Unfortunately, the optimization of these models is usually a computationally demanding task. One strategy to optimize Deep Learning models is the pruning scheme. Pruning methods are focused on reducing the complexity of the network, assuming an expected performance penalty of the model once pruned. However, the pruning could potentially be used to improve the performance, using an optimization algorithm to identify and eventually remove unnecessary connections among neurons. This work proposes EvoPruneDeepTL, an evolutionary pruning model for Transfer Learning based Deep Neural Networks which replaces the last fully-connected layers with sparse layers optimized by a genetic algorithm. Depending on its solution encoding strategy, our proposed model can either perform optimized pruning or feature selection over the densely connected part of the neural network. We carry out different experiments with several datasets to assess the benefits of our proposal. Results show the contribution of EvoPruneDeepTL and feature selection to the overall computational efficiency of the network as a result of the optimization process. In particular, the accuracy is improved, reducing at the same time the number of active neurons in the final layers.
CRJan 20, 2022
Survey on Federated Learning Threats: concepts, taxonomy on attacks and defences, experimental study and challengesNuria Rodríguez-Barroso, Daniel Jiménez López, M. Victoria Luzón et al.
Federated learning is a machine learning paradigm that emerges as a solution to the privacy-preservation demands in artificial intelligence. As machine learning, federated learning is threatened by adversarial attacks against the integrity of the learning model and the privacy of data via a distributed approach to tackle local and global learning. This weak point is exacerbated by the inaccessibility of data in federated learning, which makes harder the protection against adversarial attacks and evidences the need to furtherance the research on defence methods to make federated learning a real solution for safeguarding data privacy. In this paper, we present an extensive review of the threats of federated learning, as well as as their corresponding countermeasures, attacks versus defences. This survey provides a taxonomy of adversarial attacks and a taxonomy of defence methods that depict a general picture of this vulnerability of federated learning and how to overcome it. Likewise, we expound guidelines for selecting the most adequate defence method according to the category of the adversarial attack. Besides, we carry out an extensive experimental study from which we draw further conclusions about the behaviour of attacks and defences and the guidelines for selecting the most adequate defence method according to the category of the adversarial attack. This study is finished leading to meditated learned lessons and challenges.
AIJan 17, 2022
Data Harmonisation for Information Fusion in Digital Healthcare: A State-of-the-Art Systematic Review, Meta-Analysis and Future Research DirectionsYang Nan, Javier Del Ser, Simon Walsh et al.
Removing the bias and variance of multicentre data has always been a challenge in large scale digital healthcare studies, which requires the ability to integrate clinical features extracted from data acquired by different scanners and protocols to improve stability and robustness. Previous studies have described various computational approaches to fuse single modality multicentre datasets. However, these surveys rarely focused on evaluation metrics and lacked a checklist for computational data harmonisation studies. In this systematic review, we summarise the computational data harmonisation approaches for multi-modality data in the digital healthcare field, including harmonisation strategies and evaluation metrics based on different theories. In addition, a comprehensive checklist that summarises common practices for data harmonisation studies is proposed to guide researchers to report their research findings more effectively. Last but not least, flowcharts presenting possible ways for methodology and metric selection are proposed and the limitations of different methods have been surveyed for future research.
LGNov 11, 2021
Reducing Data Complexity using Autoencoders with Class-informed Loss FunctionsDavid Charte, Francisco Charte, Francisco Herrera
Available data in machine learning applications is becoming increasingly complex, due to higher dimensionality and difficult classes. There exists a wide variety of approaches to measuring complexity of labeled data, according to class overlap, separability or boundary shapes, as well as group morphology. Many techniques can transform the data in order to find better features, but few focus on specifically reducing data complexity. Most data transformation methods mainly treat the dimensionality aspect, leaving aside the available information within class labels which can be useful when classes are somehow complex. This paper proposes an autoencoder-based approach to complexity reduction, using class labels in order to inform the loss function about the adequacy of the generated variables. This leads to three different new feature learners, Scorer, Skaler and Slicer. They are based on Fisher's discriminant ratio, the Kullback-Leibler divergence and least-squares support vector machines, respectively. They can be applied as a preprocessing stage for a binary classification problem. A thorough experimentation across a collection of 27 datasets and a range of complexity and classification metrics shows that class-informed autoencoders perform better than 4 other popular unsupervised feature extraction techniques, especially when the final objective is using the data for a classification task.
LGSep 8, 2021
A robust approach for deep neural networks in presence of label noise: relabelling and filtering instances during trainingAnabel Gómez-Ríos, Julián Luengo, Francisco Herrera
Deep learning has outperformed other machine learning algorithms in a variety of tasks, and as a result, it is widely used. However, like other machine learning algorithms, deep learning, and convolutional neural networks (CNNs) in particular, perform worse when the data sets present label noise. Therefore, it is important to develop algorithms that help the training of deep networks and their generalization to noise-free test sets. In this paper, we propose a robust training strategy against label noise, called RAFNI, that can be used with any CNN. This algorithm filters and relabels instances of the training set based on the predictions and their probabilities made by the backbone neural network during the training process. That way, this algorithm improves the generalization ability of the CNN on its own. RAFNI consists of three mechanisms: two mechanisms that filter instances and one mechanism that relabels instances. In addition, it does not suppose that the noise rate is known nor does it need to be estimated. We evaluated our algorithm using different data sets of several sizes and characteristics. We also compared it with state-of-the-art models using the CIFAR10 and CIFAR100 benchmarks under different types and rates of label noise and found that RAFNI achieves better results in most cases.
LGMay 26, 2021
Anomaly Detection in Predictive Maintenance: A New Evaluation Framework for Temporal Unsupervised Anomaly Detection AlgorithmsJacinto Carrasco, Irina Markova, David López et al.
The research in anomaly detection lacks a unified definition of what represents an anomalous instance. Discrepancies in the nature itself of an anomaly lead to multiple paradigms of algorithms design and experimentation. Predictive maintenance is a special case, where the anomaly represents a failure that must be prevented. Related time-series research as outlier and novelty detection or time-series classification does not apply to the concept of an anomaly in this field, because they are not single points which have not been seen previously and may not be precisely annotated. Moreover, due to the lack of annotated anomalous data, many benchmarks are adapted from supervised scenarios. To address these issues, we generalise the concept of positive and negative instances to intervals to be able to evaluate unsupervised anomaly detection algorithms. We also preserve the imbalance scheme for evaluation through the proposal of the Preceding Window ROC, a generalisation for the calculation of ROC curves for time-series scenarios. We also adapt the mechanism from a established time-series anomaly detection benchmark to the proposed generalisations to reward early detection. Therefore, the proposal represents a flexible evaluation framework for the different scenarios. To show the usefulness of this definition, we include a case study of Big Data algorithms with a real-world time-series problem provided by the company ArcelorMittal, and compare the proposal with an evaluation method.
CVMay 25, 2021
CI-dataset and DetDSCI methodology for detecting too small and too large critical infrastructures in satellite images: Airports and electrical substations as case studyFrancisco Pérez-Hernández, José Rodríguez-Ortega, Yassir Benhammou et al.
The detection of critical infrastructures in large territories represented by aerial and satellite images is of high importance in several fields such as in security, anomaly detection, land use planning and land use change detection. However, the detection of such infrastructures is complex as they have highly variable shapes and sizes, i.e., some infrastructures, such as electrical substations, are too small while others, such as airports, are too large. Besides, airports can have a surface area either small or too large with completely different shapes, which makes its correct detection challenging. As far as we know, these limitations have not been tackled yet in previous works. This paper presents (1) a smart Critical Infrastructure dataset, named CI-dataset, organised into two scales, small and large scales critical infrastructures and (2) a two-level resolution-independent critical infrastructure detection (DetDSCI) methodology that first determines the spatial resolution of the input image using a classification model, then analyses the image using the appropriate detector for that spatial resolution. The present study targets two representative classes, airports and electrical substations. Our experiments show that DetDSCI methodology achieves up to 37,53% F1 improvement with respect to Faster R-CNN, one of the most influential detection models.
LGApr 24, 2021
EXplainable Neural-Symbolic Learning (X-NeSyL) methodology to fuse deep learning representations with expert knowledge graphs: the MonuMAI cultural heritage use caseNatalia Díaz-Rodríguez, Alberto Lamas, Jules Sanchez et al.
The latest Deep Learning (DL) models for detection and classification have achieved an unprecedented performance over classical machine learning algorithms. However, DL models are black-box methods hard to debug, interpret, and certify. DL alone cannot provide explanations that can be validated by a non technical audience. In contrast, symbolic AI systems that convert concepts into rules or symbols -- such as knowledge graphs -- are easier to explain. However, they present lower generalisation and scaling capabilities. A very important challenge is to fuse DL representations with expert knowledge. One way to address this challenge, as well as the performance-explainability trade-off is by leveraging the best of both streams without obviating domain expert knowledge. We tackle such problem by considering the symbolic knowledge is expressed in form of a domain expert knowledge graph. We present the eXplainable Neural-symbolic learning (X-NeSyL) methodology, designed to learn both symbolic and deep representations, together with an explainability metric to assess the level of alignment of machine and human expert explanations. The ultimate objective is to fuse DL representations with expert domain knowledge during the learning process to serve as a sound basis for explainability. X-NeSyL methodology involves the concrete use of two notions of explanation at inference and training time respectively: 1) EXPLANet: Expert-aligned eXplainable Part-based cLAssifier NETwork Architecture, a compositional CNN that makes use of symbolic representations, and 2) SHAP-Backprop, an explainable AI-informed training procedure that guides the DL process to align with such symbolic representations in form of knowledge graphs. We showcase X-NeSyL methodology using MonuMAI dataset for monument facade image classification, and demonstrate that our approach improves explainability and performance.
CVApr 23, 2021
MULTICAST: MULTI Confirmation-level Alarm SysTem based on CNN and LSTM to mitigate false alarms for handgun detection in video-surveillanceRoberto Olmos, Siham Tabik, Francisco Perez-Hernandez et al.
Despite the constant advances in computer vision, integrating modern single-image detectors in real-time handgun alarm systems in video-surveillance is still debatable. Using such detectors still implies a high number of false alarms and false negatives. In this context, most existent studies select one of the latest single-image detectors and train it on a better dataset or use some pre-processing, post-processing or data-fusion approach to further reduce false alarms. However, none of these works tried to exploit the temporal information present in the videos to mitigate false detections. This paper presents a new system, called MULTI Confirmation-level Alarm SysTem based on Convolutional Neural Networks (CNN) and Long Short Term Memory networks (LSTM) (MULTICAST), that leverages not only the spacial information but also the temporal information existent in the videos for a more reliable handgun detection. MULTICAST consists of three stages, i) a handgun detection stage, ii) a CNN-based spacial confirmation stage and iii) LSTM-based temporal confirmation stage. The temporal confirmation stage uses the positions of the detected handgun in previous instants to predict its trajectory in the next frame. Our experiments show that MULTICAST reduces by 80% the number of false alarms with respect to Faster R-CNN based-single-image detector, which makes it more useful in providing more effective and rapid security responses.
NEOct 8, 2020
AT-MFCGA: An Adaptive Transfer-guided Multifactorial Cellular Genetic Algorithm for Evolutionary MultitaskingEneko Osaba, Javier Del Ser, Aritz D. Martinez et al.
Transfer Optimization is an incipient research area dedicated to solving multiple optimization tasks simultaneously. Among the different approaches that can address this problem effectively, Evolutionary Multitasking resorts to concepts from Evolutionary Computation to solve multiple problems within a single search process. In this paper we introduce a novel adaptive metaheuristic algorithm to deal with Evolutionary Multitasking environments coined as Adaptive Transfer-guided Multifactorial Cellular Genetic Algorithm (AT-MFCGA). AT-MFCGA relies on cellular automata to implement mechanisms in order to exchange knowledge among the optimization problems under consideration. Furthermore, our approach is able to explain by itself the synergies among tasks that were encountered and exploited during the search, which helps us to understand interactions between related optimization tasks. A comprehensive experimental setup is designed to assess and compare the performance of AT-MFCGA to that of other renowned evolutionary multitasking alternatives (MFEA and MFEA-II). Experiments comprise 11 multitasking scenarios composed of 20 instances of 4 combinatorial optimization problems, yielding the largest discrete multitasking environment solved to date. Results are conclusive in regard to the superior quality of solutions provided by AT-MFCGA with respect to the rest of the methods, which are complemented by a quantitative examination of the genetic transferability among tasks throughout the search process.
LGSep 21, 2020
CURIE: A Cellular Automaton for Concept Drift DetectionJesus L. Lobo, Javier Del Ser, Eneko Osaba et al.
Data stream mining extracts information from large quantities of data flowing fast and continuously (data streams). They are usually affected by changes in the data distribution, giving rise to a phenomenon referred to as concept drift. Thus, learning models must detect and adapt to such changes, so as to exhibit a good predictive performance after a drift has occurred. In this regard, the development of effective drift detection algorithms becomes a key factor in data stream mining. In this work we propose CU RIE, a drift detector relying on cellular automata. Specifically, in CU RIE the distribution of the data stream is represented in the grid of a cellular automata, whose neighborhood rule can then be utilized to detect possible distribution changes over the stream. Computer simulations are presented and discussed to show that CU RIE, when hybridized with other base learners, renders a competitive behavior in terms of detection metrics and classification accuracy. CU RIE is compared with well-established drift detectors over synthetic datasets with varying drift characteristics.
NEAug 9, 2020
Lights and Shadows in Evolutionary Deep Learning: Taxonomy, Critical Methodological Analysis, Cases of Study, Learned Lessons, Recommendations and ChallengesAritz D. Martinez, Javier Del Ser, Esther Villar-Rodriguez et al.
Much has been said about the fusion of bio-inspired optimization algorithms and Deep Learning models for several purposes: from the discovery of network topologies and hyper-parametric configurations with improved performance for a given task, to the optimization of the model's parameters as a replacement for gradient-based solvers. Indeed, the literature is rich in proposals showcasing the application of assorted nature-inspired approaches for these tasks. In this work we comprehensively review and critically examine contributions made so far based on three axes, each addressing a fundamental question in this research avenue: a) optimization and taxonomy (Why?), including a historical perspective, definitions of optimization problems in Deep Learning, and a taxonomy associated with an in-depth analysis of the literature, b) critical methodological analysis (How?), which together with two case studies, allows us to address learned lessons and recommendations for good practices following the analysis of the literature, and c) challenges and new directions of research (What can be done, and what for?). In summary, three axes - optimization and taxonomy, critical analysis, and challenges - which outline a complete vision of a merger of two technologies drawing up an exciting future for this area of fusion research.