Walter Karlen

LG
h-index8
12papers
930citations
Novelty55%
AI Score34

12 Papers

SPApr 11, 2025
Artifact detection and localization in single-channel mobile EEG for sleep research using deep learning and attention mechanisms

Khrystyna Semkiv, Jia Zhang, Maria Laura Ferster et al.

Artifacts in the electroencephalogram (EEG) degrade signal quality and impact the analysis of brain activity. Current methods for detecting artifacts in sleep EEG rely on simple threshold-based algorithms that require manual intervention, which is time-consuming and impractical due to the vast volume of data that novel mobile recording systems generate. We propose a convolutional neural network (CNN) model incorporating a convolutional block attention module (CNN-CBAM) to detect and identify the location of artifacts in the sleep EEG with attention maps. We benchmarked this model against six other machine learning and signal processing approaches. We trained/tuned all models on 72 manually annotated EEG recordings obtained during home-based monitoring from 18 healthy participants with a mean (SD) age of 68.05 y ($\pm$5.02). We tested them on 26 separate recordings from 6 healthy participants with a mean (SD) age of 68.33 y ($\pm$4.08), with contained artifacts in 4\% of epochs. CNN-CBAM achieved the highest area under the receiver operating characteristic curve (0.88), sensitivity (0.81), and specificity (0.86) when compared to the other approaches. The attention maps from CNN-CBAM localized artifacts within the epoch with a sensitivity of 0.71 and specificity of 0.67. This work demonstrates the feasibility of automating the detection and localization of artifacts in wearable sleep EEG.

CVNov 2, 2021
Detect-and-Segment: a Deep Learning Approach to Automate Wound Image Segmentation

Gaetano Scebba, Jia Zhang, Sabrina Catanzaro et al.

Chronic wounds significantly impact quality of life. If not properly managed, they can severely deteriorate. Image-based wound analysis could aid in objectively assessing the wound status by quantifying important features that are related to healing. However, the high heterogeneity of the wound types, image background composition, and capturing conditions challenge the robust segmentation of wound images. We present Detect-and-Segment (DS), a deep learning approach to produce wound segmentation maps with high generalization capabilities. In our approach, dedicated deep neural networks detected the wound position, isolated the wound from the uninformative background, and computed the wound segmentation map. We evaluated this approach using one data set with images of diabetic foot ulcers. For further testing, 4 supplemental independent data sets with larger variety of wound types from different body locations were used. The Matthews' correlation coefficient (MCC) improved from 0.29 when computing the segmentation on the full image to 0.85 when combining detection and segmentation in the same approach. When tested on the wound images drawn from the supplemental data sets, the DS approach increased the mean MCC from 0.17 to 0.85. Furthermore, the DS approach enabled the training of segmentation models with up to 90% less training data while maintaining the segmentation performance.

SPApr 21, 2020
Multispectral Video Fusion for Non-contact Monitoring of Respiratory Rate and Apnea

Gaetano Scebba, Giulia Da Poian, Walter Karlen

Continuous monitoring of respiratory activity is desirable in many clinical applications to detect respiratory events. Non-contact monitoring of respiration can be achieved with near- and far-infrared spectrum cameras. However, current technologies are not sufficiently robust to be used in clinical applications. For example, they fail to estimate an accurate respiratory rate (RR) during apnea. We present a novel algorithm based on multispectral data fusion that aims at estimating RR also during apnea. The algorithm independently addresses the RR estimation and apnea detection tasks. Respiratory information is extracted from multiple sources and fed into an RR estimator and an apnea detector whose results are fused into a final respiratory activity estimation. We evaluated the system retrospectively using data from 30 healthy adults who performed diverse controlled breathing tasks while lying supine in a dark room and reproduced central and obstructive apneic events. Combining multiple respiratory information from multispectral cameras improved the root mean square error (RMSE) accuracy of the RR estimation from up to 4.64 monospectral data down to 1.60 breaths/min. The median F1 scores for classifying obstructive (0.75 to 0.86) and central apnea (0.75 to 0.93) also improved. Furthermore, the independent consideration of apnea detection led to a more robust system (RMSE of 4.44 vs. 7.96 breaths/min). Our findings may represent a step towards the use of cameras for vital sign monitoring in medical applications.

CYJan 2, 2020
A Deep Learning Approach to Diagnosing Multiple Sclerosis from Smartphone Data

Patrick Schwab, Walter Karlen

Multiple sclerosis (MS) affects the central nervous system with a wide range of symptoms. MS can, for example, cause pain, changes in mood and fatigue, and may impair a person's movement, speech and visual functions. Diagnosis of MS typically involves a combination of complex clinical assessments and tests to rule out other diseases with similar symptoms. New technologies, such as smartphone monitoring in free-living conditions, could potentially aid in objectively assessing the symptoms of MS by quantifying symptom presence and intensity over long periods of time. Here, we present a deep-learning approach to diagnosing MS from smartphone-derived digital biomarkers that uses a novel combination of a multilayer perceptron with neural soft attention to improve learning of patterns in long-term smartphone monitoring data. Using data from a cohort of 774 participants, we demonstrate that our deep-learning models are able to distinguish between people with and without MS with an area under the receiver operating characteristic curve of 0.88 (95% CI: 0.70, 0.88). Our experimental results indicate that digital biomarkers derived from smartphone data could in the future be used as additional diagnostic criteria for MS.

LGOct 27, 2019
CXPlain: Causal Explanations for Model Interpretation under Uncertainty

Patrick Schwab, Walter Karlen

Feature importance estimates that inform users about the degree to which given inputs influence the output of a predictive model are crucial for understanding, validating, and interpreting machine-learning models. However, providing fast and accurate estimates of feature importance for high-dimensional data, and quantifying the uncertainty of such estimates remain open challenges. Here, we frame the task of providing explanations for the decisions of machine-learning models as a causal learning task, and train causal explanation (CXPlain) models that learn to estimate to what degree certain inputs cause outputs in another machine-learning model. CXPlain can, once trained, be used to explain the target model in little time, and enables the quantification of the uncertainty associated with its feature importance estimates via bootstrap ensembling. We present experiments that demonstrate that CXPlain is significantly more accurate and faster than existing model-agnostic methods for estimating feature importance. In addition, we confirm that the uncertainty estimates provided by CXPlain ensembles are strongly correlated with their ability to accurately estimate feature importance on held-out data.

SPFeb 25, 2019
Forecasting intracranial hypertension using multi-scale waveform metrics

Matthias Hüser, Adrian Kündig, Walter Karlen et al.

Objective: Acute intracranial hypertension is an important risk factor of secondary brain damage after traumatic brain injury. Hypertensive episodes are often diagnosed reactively, leading to late detection and lost time for intervention planning. A pro-active approach that predicts critical events several hours ahead of time could assist in directing attention to patients at risk. Approach: We developed a prediction framework that forecasts onsets of acute intracranial hypertension in the next 8 hours. It jointly uses cerebral auto-regulation indices, spectral energies and morphological pulse metrics to describe the neurological state of the patient. One-minute base windows were compressed by computing signal metrics, and then stored in a multi-scale history, from which physiological features were derived. Main results: Our model predicted events up to 8 hours in advance with alarm recall rates of 90% at a precision of 30.3% in the MIMIC-III waveform database, improving upon two baselines from the literature. We found that features derived from high-frequency waveforms substantially improved the prediction performance over simple statistical summaries of low-frequency time series, and each of the three feature classes contributed to the performance gain. The inclusion of long-term history up to 8 hours was especially important. Significance: Our results highlight the importance of information contained in high-frequency waveforms in the neurological intensive care unit. They could motivate future studies on pre-hypertensive patterns and the design of new alarm algorithms for critical events in the injured brain.

LGFeb 3, 2019
Learning Counterfactual Representations for Estimating Individual Dose-Response Curves

Patrick Schwab, Lorenz Linhardt, Stefan Bauer et al.

Estimating what would be an individual's potential response to varying levels of exposure to a treatment is of high practical relevance for several important fields, such as healthcare, economics and public policy. However, existing methods for learning to estimate counterfactual outcomes from observational data are either focused on estimating average dose-response curves, or limited to settings with only two treatments that do not have an associated dosage parameter. Here, we present a novel machine-learning approach towards learning counterfactual representations for estimating individual dose-response curves for any number of treatments with continuous dosage parameters with neural networks. Building on the established potential outcomes framework, we introduce performance metrics, model selection criteria, model architectures, and open benchmarks for estimating individual dose-response curves. Our experiments show that the methods developed in this work set a new state-of-the-art in estimating individual dose-response.

LGOct 1, 2018
Perfect Match: A Simple Method for Learning Representations For Counterfactual Inference With Neural Networks

Patrick Schwab, Lorenz Linhardt, Walter Karlen

Learning representations for counterfactual inference from observational data is of high practical relevance for many domains, such as healthcare, public policy and economics. Counterfactual inference enables one to answer "What if...?" questions, such as "What would be the outcome if we gave this patient treatment $t_1$?". However, current methods for training neural networks for counterfactual inference on observational data are either overly complex, limited to settings with only two available treatments, or both. Here, we present Perfect Match (PM), a method for training neural networks for counterfactual inference that is easy to implement, compatible with any architecture, does not add computational complexity or hyperparameters, and extends to any number of treatments. PM is based on the idea of augmenting samples within a minibatch with their propensity-matched nearest neighbours. Our experiments demonstrate that PM outperforms a number of more complex state-of-the-art methods in inferring counterfactual outcomes across several benchmarks, particularly in settings with many treatments.

NCOct 1, 2018
PhoneMD: Learning to Diagnose Parkinson's Disease from Smartphone Data

Patrick Schwab, Walter Karlen

Parkinson's disease is a neurodegenerative disease that can affect a person's movement, speech, dexterity, and cognition. Clinicians primarily diagnose Parkinson's disease by performing a clinical assessment of symptoms. However, misdiagnoses are common. One factor that contributes to misdiagnoses is that the symptoms of Parkinson's disease may not be prominent at the time the clinical assessment is performed. Here, we present a machine-learning approach towards distinguishing between people with and without Parkinson's disease using long-term data from smartphone-based walking, voice, tapping and memory tests. We demonstrate that our attentive deep-learning models achieve significant improvements in predictive performance over strong baselines (area under the receiver operating characteristic curve = 0.85) in data from a cohort of 1853 participants. We also show that our models identify meaningful features in the input data. Our results confirm that smartphone data collected over extended periods of time could in the future potentially be used as a digital biomarker for the diagnosis of Parkinson's disease.

LGFeb 14, 2018
Not to Cry Wolf: Distantly Supervised Multitask Learning in Critical Care

Patrick Schwab, Emanuela Keller, Carl Muroi et al.

Patients in the intensive care unit (ICU) require constant and close supervision. To assist clinical staff in this task, hospitals use monitoring systems that trigger audiovisual alarms if their algorithms indicate that a patient's condition may be worsening. However, current monitoring systems are extremely sensitive to movement artefacts and technical errors. As a result, they typically trigger hundreds to thousands of false alarms per patient per day - drowning the important alarms in noise and adding to the exhaustion of clinical staff. In this setting, data is abundantly available, but obtaining trustworthy annotations by experts is laborious and expensive. We frame the problem of false alarm reduction from multivariate time series as a machine-learning task and address it with a novel multitask network architecture that utilises distant supervision through multiple related auxiliary tasks in order to reduce the number of expensive labels required for training. We show that our approach leads to significant improvements over several state-of-the-art baselines on real-world ICU data and provide new insights on the importance of task selection and architectural choices in distantly supervised multitask learning.

LGFeb 6, 2018
Granger-causal Attentive Mixtures of Experts: Learning Important Features with Neural Networks

Patrick Schwab, Djordje Miladinovic, Walter Karlen

Knowledge of the importance of input features towards decisions made by machine-learning models is essential to increase our understanding of both the models and the underlying data. Here, we present a new approach to estimating feature importance with neural networks based on the idea of distributing the features of interest among experts in an attentive mixture of experts (AME). AMEs use attentive gating networks trained with a Granger-causal objective to learn to jointly produce accurate predictions as well as estimates of feature importance in a single model. Our experiments show (i) that the feature importance estimates provided by AMEs compare favourably to those provided by state-of-the-art methods, (ii) that AMEs are significantly faster at estimating feature importance than existing methods, and (iii) that the associations discovered by AMEs are consistent with those reported by domain experts.

LGOct 17, 2017
Beat by Beat: Classifying Cardiac Arrhythmias with Recurrent Neural Networks

Patrick Schwab, Gaetano Scebba, Jia Zhang et al.

With tens of thousands of electrocardiogram (ECG) records processed by mobile cardiac event recorders every day, heart rhythm classification algorithms are an important tool for the continuous monitoring of patients at risk. We utilise an annotated dataset of 12,186 single-lead ECG recordings to build a diverse ensemble of recurrent neural networks (RNNs) that is able to distinguish between normal sinus rhythms, atrial fibrillation, other types of arrhythmia and signals that are too noisy to interpret. In order to ease learning over the temporal dimension, we introduce a novel task formulation that harnesses the natural segmentation of ECG signals into heartbeats to drastically reduce the number of time steps per sequence. Additionally, we extend our RNNs with an attention mechanism that enables us to reason about which heartbeats our RNNs focus on to make their decisions. Through the use of attention, our model maintains a high degree of interpretability, while also achieving state-of-the-art classification performance with an average F1 score of 0.79 on an unseen test set (n=3,658).