Cristina Soguero-Ruiz

LG
h-index21
12papers
203citations
Novelty41%
AI Score30

12 Papers

LGJul 24, 2024
Explainable Artificial Intelligence Techniques for Irregular Temporal Classification of Multidrug Resistance Acquisition in Intensive Care Unit Patients

Óscar Escudero-Arnanz, Cristina Soguero-Ruiz, Joaquín Álvarez-Rodríguez et al.

Antimicrobial Resistance represents a significant challenge in the Intensive Care Unit (ICU), where patients are at heightened risk of Multidrug-Resistant (MDR) infections-pathogens resistant to multiple antimicrobial agents. This study introduces a novel methodology that integrates Gated Recurrent Units (GRUs) with advanced intrinsic and post-hoc interpretability techniques for detecting the onset of MDR in patients across time. Within interpretability methods, we propose Explainable Artificial Intelligence (XAI) approaches to handle irregular Multivariate Time Series (MTS), introducing Irregular Time Shapley Additive Explanations (IT-SHAP), a modification of Shapley Additive Explanations designed for irregular MTS with Recurrent Neural Networks focused on temporal outputs. Our methodology aims to identify specific risk factors associated with MDR in ICU patients. GRU with Hadamard's attention demonstrated high initial specificity and increasing sensitivity over time, correlating with increased nosocomial infection risks during prolonged ICU stays. XAI analysis, enhanced by Hadamard attention and IT-SHAP, identified critical factors such as previous non-resistant cultures, specific antibiotic usage patterns, and hospital environment dynamics. These insights suggest that early detection of at-risk patients can inform interventions such as preventive isolation and customized treatments, significantly improving clinical outcomes. The proposed GRU model for temporal classification achieved an average Receiver Operating Characteristic Area Under the Curve of 78.27 +- 1.26 over time, indicating strong predictive performance. In summary, this study highlights the clinical utility of our methodology, which combines predictive accuracy with interpretability, thereby facilitating more effective healthcare interventions by professionals.

LGApr 24, 2025Code
Early Detection of Multidrug Resistance Using Multivariate Time Series Analysis and Interpretable Patient-Similarity Representations

Óscar Escudero-Arnanz, Antonio G. Marques, Inmaculada Mora-Jiménez et al.

Background and Objectives: Multidrug Resistance (MDR) is a critical global health issue, causing increased hospital stays, healthcare costs, and mortality. This study proposes an interpretable Machine Learning (ML) framework for MDR prediction, aiming for both accurate inference and enhanced explainability. Methods: Patients are modeled as Multivariate Time Series (MTS), capturing clinical progression and patient-to-patient interactions. Similarity among patients is quantified using MTS-based methods: descriptive statistics, Dynamic Time Warping, and Time Cluster Kernel. These similarity measures serve as inputs for MDR classification via Logistic Regression, Random Forest, and Support Vector Machines, with dimensionality reduction and kernel transformations improving model performance. For explainability, patient similarity networks are constructed from these metrics. Spectral clustering and t-SNE are applied to identify MDR-related subgroups and visualize high-risk clusters, enabling insight into clinically relevant patterns. Results: The framework was validated on ICU Electronic Health Records from the University Hospital of Fuenlabrada, achieving an AUC of 81%. It outperforms baseline ML and deep learning models by leveraging graph-based patient similarity. The approach identifies key risk factors -- prolonged antibiotic use, invasive procedures, co-infections, and extended ICU stays -- and reveals clinically meaningful clusters. Code and results are available at \https://github.com/oscarescuderoarnanz/DM4MTS. Conclusions: Patient similarity representations combined with graph-based analysis provide accurate MDR prediction and interpretable insights. This method supports early detection, risk factor identification, and patient stratification, highlighting the potential of explainable ML in critical care.

CVApr 26, 2024
LM-IGTD: a 2D image generator for low-dimensional and mixed-type tabular data to leverage the potential of convolutional neural networks

Vanesa Gómez-Martínez, Francisco J. Lara-Abelenda, Pablo Peiro-Corbacho et al.

Tabular data have been extensively used in different knowledge domains. Convolutional neural networks (CNNs) have been successfully used in many applications where important information about data is embedded in the order of features (images), outperforming predictive results of traditional models. Recently, several researchers have proposed transforming tabular data into images to leverage the potential of CNNs and obtain high results in predictive tasks such as classification and regression. In this paper, we present a novel and effective approach for transforming tabular data into images, addressing the inherent limitations associated with low-dimensional and mixed-type datasets. Our method, named Low Mixed-Image Generator for Tabular Data (LM-IGTD), integrates a stochastic feature generation process and a modified version of the IGTD. We introduce an automatic and interpretable end-to-end pipeline, enabling the creation of images from tabular data. A mapping between original features and the generated images is established, and post hoc interpretability methods are employed to identify crucial areas of these images, enhancing interpretability for predictive tasks. An extensive evaluation of the tabular-to-image generation approach proposed on 12 low-dimensional and mixed-type datasets, including binary and multi-class classification scenarios. In particular, our method outperformed all traditional ML models trained on tabular data in five out of twelve datasets when using images generated with LM-IGTD and CNN. In the remaining datasets, LM-IGTD images and CNN consistently surpassed three out of four traditional ML models, achieving similar results to the fourth model.

LGNov 1, 2024
Explainable Spatio-Temporal GCNNs for Irregular Multivariate Time Series: Architecture and Application to ICU Patient Data

Óscar Escudero-Arnanz, Cristina Soguero-Ruiz, Antonio G. Marques

In this paper, we present XST-GCNN (eXplainable Spatio-Temporal Graph Convolutional Neural Network), a novel architecture for processing heterogeneous and irregular Multivariate Time Series (MTS) data. Our approach captures temporal and feature dependencies within a unified spatio-temporal pipeline by leveraging a GCNN that uses a spatio-temporal graph aimed at optimizing predictive accuracy and interoperability. For graph estimation, we introduce techniques, including one based on the (heterogeneous) Gower distance. Once estimated, we propose two methods for graph construction: one based on the Cartesian product, treating temporal instants homogeneously, and another spatio-temporal approach with distinct graphs per time step. We also propose two GCNN architectures: a standard GCNN with a normalized adjacency matrix and a higher-order polynomial GCNN. In addition to accuracy, we emphasize explainability by designing an inherently interpretable model and performing a thorough interpretability analysis, identifying key feature-time combinations that drive predictions. We evaluate XST-GCNN using real-world Electronic Health Record data from University Hospital of Fuenlabrada to predict Multidrug Resistance (MDR) in ICU patients, a critical healthcare challenge linked to high mortality and complex treatments. Our architecture outperforms traditional models, achieving a mean ROC-AUC score of 81.03 +- 2.43. Furthermore, the interpretability analysis provides actionable insights into clinical factors driving MDR predictions, enhancing model transparency. This work sets a benchmark for tackling complex inference tasks with heterogeneous MTS, offering a versatile, interpretable solution for real-world applications.

LGFeb 9, 2024
Multimodal Interpretable Data-Driven Models for Early Prediction of Antimicrobial Multidrug Resistance Using Multivariate Time-Series

Sergio Martínez-Agüero, Antonio G. Marques, Inmaculada Mora-Jiménez et al.

Electronic health records (EHR) is an inherently multimodal register of the patient's health status characterized by static data and multivariate time series (MTS). While MTS are a valuable tool for clinical prediction, their fusion with other data modalities can possibly result in more thorough insights and more accurate results. Deep neural networks (DNNs) have emerged as fundamental tools for identifying and defining underlying patterns in the healthcare domain. However, fundamental improvements in interpretability are needed for DNN models to be widely used in the clinical setting. In this study, we present an approach built on a collection of interpretable multimodal data-driven models that may anticipate and understand the emergence of antimicrobial multidrug resistance (AMR) germs in the intensive care unit (ICU) of the University Hospital of Fuenlabrada (Madrid, Spain). The profile and initial health status of the patient are modeled using static variables, while the evolution of the patient's health status during the ICU stay is modeled using several MTS, including mechanical ventilation and antibiotics intake. The multimodal DNNs models proposed in this paper include interpretable principles in addition to being effective at predicting AMR and providing an explainable prediction support system for AMR in the ICU. Furthermore, our proposed methodology based on multimodal models and interpretability schemes can be leveraged in additional clinical problems dealing with EHR data, broadening the impact and applicability of our results.

LGJul 7, 2021
On the Use of Time Series Kernel and Dimensionality Reduction to Identify the Acquisition of Antimicrobial Multidrug Resistance in the Intensive Care Unit

Óscar Escudero-Arnanz, Joaquín Rodríguez-Álvarez, Karl Øyvind Mikalsen et al.

The acquisition of Antimicrobial Multidrug Resistance (AMR) in patients admitted to the Intensive Care Units (ICU) is a major global concern. This study analyses data in the form of multivariate time series (MTS) from 3476 patients recorded at the ICU of University Hospital of Fuenlabrada (Madrid) from 2004 to 2020. 18\% of the patients acquired AMR during their stay in the ICU. The goal of this paper is an early prediction of the development of AMR. Towards that end, we leverage the time-series cluster kernel (TCK) to learn similarities between MTS. To evaluate the effectiveness of TCK as a kernel, we applied several dimensionality reduction techniques for visualization and classification tasks. The experimental results show that TCK allows identifying a group of patients that acquire the AMR during the first 48 hours of their ICU stay, and it also provides good classification capabilities.

MLFeb 27, 2020
A Kernel to Exploit Informative Missingness in Multivariate Time Series from EHRs

Karl Øyvind Mikalsen, Cristina Soguero-Ruiz, Robert Jenssen

A large fraction of the electronic health records (EHRs) consists of clinical measurements collected over time, such as lab tests and vital signs, which provide important information about a patient's health status. These sequences of clinical measurements are naturally represented as time series, characterized by multiple variables and large amounts of missing data, which complicate the analysis. In this work, we propose a novel kernel which is capable of exploiting both the information from the observed values as well the information hidden in the missing patterns in multivariate time series (MTS) originating e.g. from EHRs. The kernel, called TCK$_{IM}$, is designed using an ensemble learning strategy in which the base models are novel mixed mode Bayesian mixture models which can effectively exploit informative missingness without having to resort to imputation methods. Moreover, the ensemble approach ensures robustness to hyperparameters and therefore TCK$_{IM}$ is particularly well suited if there is a lack of labels - a known challenge in medical applications. Experiments on three real-world clinical datasets demonstrate the effectiveness of the proposed kernel.

LGJul 10, 2019
Time series cluster kernels to exploit informative missingness and incomplete label information

Karl Øyvind Mikalsen, Cristina Soguero-Ruiz, Filippo Maria Bianchi et al.

The time series cluster kernel (TCK) provides a powerful tool for analysing multivariate time series subject to missing data. TCK is designed using an ensemble learning approach in which Bayesian mixture models form the base models. Because of the Bayesian approach, TCK can naturally deal with missing values without resorting to imputation and the ensemble strategy ensures robustness to hyperparameters, making it particularly well suited for unsupervised learning. However, TCK assumes missing at random and that the underlying missingness mechanism is ignorable, i.e. uninformative, an assumption that does not hold in many real-world applications, such as e.g. medicine. To overcome this limitation, we present a kernel capable of exploiting the potentially rich information in the missing values and patterns, as well as the information from the observed data. In our approach, we create a representation of the missing pattern, which is incorporated into mixed mode mixture models in such a way that the information provided by the missing patterns is effectively exploited. Moreover, we also propose a semi-supervised kernel, capable of taking advantage of incomplete label information to learn more accurate similarities. Experiments on benchmark data, as well as a real-world case study of patients described by longitudinal electronic health record data who potentially suffer from hospital-acquired infections, demonstrate the effectiveness of the proposed methods.

MLFeb 20, 2019
Noisy multi-label semi-supervised dimensionality reduction

Karl Øyvind Mikalsen, Cristina Soguero-Ruiz, Filippo Maria Bianchi et al.

Noisy labeled data represent a rich source of information that often are easily accessible and cheap to obtain, but label noise might also have many negative consequences if not accounted for. How to fully utilize noisy labels has been studied extensively within the framework of standard supervised machine learning over a period of several decades. However, very little research has been conducted on solving the challenge posed by noisy labels in non-standard settings. This includes situations where only a fraction of the samples are labeled (semi-supervised) and each high-dimensional sample is associated with multiple labels. In this work, we present a novel semi-supervised and multi-label dimensionality reduction method that effectively utilizes information from both noisy multi-labels and unlabeled data. With the proposed Noisy multi-label semi-supervised dimensionality reduction (NMLSDR) method, the noisy multi-labels are denoised and unlabeled data are labeled simultaneously via a specially designed label propagation algorithm. NMLSDR then learns a projection matrix for reducing the dimensionality by maximizing the dependence between the enlarged and denoised multi-label space and the features in the projected space. Extensive experiments on synthetic data, benchmark datasets, as well as a real-world case study, demonstrate the effectiveness of the proposed algorithm and show that it outperforms state-of-the-art multi-label feature extraction algorithms.

MLMar 21, 2018
An Unsupervised Multivariate Time Series Kernel Approach for Identifying Patients with Surgical Site Infection from Blood Samples

Karl Øyvind Mikalsen, Cristina Soguero-Ruiz, Filippo Maria Bianchi et al.

A large fraction of the electronic health records consists of clinical measurements collected over time, such as blood tests, which provide important information about the health status of a patient. These sequences of clinical measurements are naturally represented as time series, characterized by multiple variables and the presence of missing data, which complicate analysis. In this work, we propose a surgical site infection detection framework for patients undergoing colorectal cancer surgery that is completely unsupervised, hence alleviating the problem of getting access to labelled training data. The framework is based on powerful kernels for multivariate time series that account for missing data when computing similarities. Our approach show superior performance compared to baselines that have to resort to imputation techniques and performs comparable to a supervised classification baseline.

NENov 17, 2017
Classification of postoperative surgical site infections from blood measurements with missing data using recurrent neural networks

Andreas Storvik Strauman, Filippo Maria Bianchi, Karl Øyvind Mikalsen et al.

Clinical measurements that can be represented as time series constitute an important fraction of the electronic health records and are often both uncertain and incomplete. Recurrent neural networks are a special class of neural networks that are particularly suitable to process time series data but, in their original formulation, cannot explicitly deal with missing data. In this paper, we explore imputation strategies for handling missing values in classifiers based on recurrent neural network (RNN) and apply a recently proposed recurrent architecture, the Gated Recurrent Unit with Decay, specifically designed to handle missing data. We focus on the problem of detecting surgical site infection in patients by analyzing time series of their blood sample measurements and we compare the results obtained with different RNN-based classifiers.

MLApr 3, 2017
Time Series Cluster Kernel for Learning Similarities between Multivariate Time Series with Missing Data

Karl Øyvind Mikalsen, Filippo Maria Bianchi, Cristina Soguero-Ruiz et al.

Similarity-based approaches represent a promising direction for time series analysis. However, many such methods rely on parameter tuning, and some have shortcomings if the time series are multivariate (MTS), due to dependencies between attributes, or the time series contain missing data. In this paper, we address these challenges within the powerful context of kernel methods by proposing the robust \emph{time series cluster kernel} (TCK). The approach taken leverages the missing data handling properties of Gaussian mixture models (GMM) augmented with informative prior distributions. An ensemble learning approach is exploited to ensure robustness to parameters by combining the clustering results of many GMM to form the final kernel. We evaluate the TCK on synthetic and real data and compare to other state-of-the-art techniques. The experimental results demonstrate that the TCK is robust to parameter choices, provides competitive results for MTS without missing data and outstanding results for missing data.