Christian Poellabauer

CR
h-index6
19papers
881citations
Novelty37%
AI Score46

19 Papers

LGApr 19
STEP-PD: Stage-Aware and Explainable Parkinson's Disease Severity Classification Using Multimodal Clinical Assessments

Md Mezbahul Islam, John Michael Templeton, Christian Poellabauer et al.

Parkinson's disease (PD) is a progressive disorder in which symptom burden and functional impairment evolve over time, making severity staging essential for clinical monitoring and treatment planning. However, many computational studies emphasize binary PD detection and do not fully use repeated follow-up clinical assessments for stage-aware prediction. This study proposes STEP-PD, a severity-aware machine learning framework to classify PD severity using clinically interpretable boundaries. It leverages all available visits from the Parkinson's Progression Markers Initiative (PPMI) and integrates routinely collected subjective questionnaires and objective clinician-assessed measures. Disease severity is defined using Hoehn and Yahr staging and grouped into three clinically meaningful categories: Healthy, Mild PD (stages 1-2), and Moderate-to-Severe PD (stages 3-5). Three binary classification problems and a three-class severity task were evaluated using stratified cross-validation with imbalance-aware training. To enhance interpretability, SHAP was used to provide global explanations and local patient-level waterfall explanations. Across all tasks, XGBoost achieved the strongest and most stable performance, with accuracies of 95.48% (Healthy vs. Mild), 99.44% (Healthy vs. Moderate-to-Severe), and 96.78% (Mild vs. Moderate-to-Severe), and 94.14% accuracy with 0.8775 Macro-F1 for three-class severity classification. Explainability results highlight a shift from early motor features to progression-related axial and balance impairments. These findings show that multimodal clinical assessments within the PPMI cohort can support accurate and interpretable visit-level PD severity stratification.

LGJan 30
SCOPE-PD: Explainable AI on Subjective and Clinical Objective Measurements of Parkinson's Disease for Precision Decision-Making

Md Mezbahul Islam, John Michael Templeton, Masrur Sobhan et al.

Parkinson's disease (PD) is a chronic and complex neurodegenerative disorder influenced by genetic, clinical, and lifestyle factors. Predicting this disease early is challenging because it depends on traditional diagnostic methods that face issues of subjectivity, which commonly delay diagnosis. Several objective analyses are currently in practice to help overcome the challenges of subjectivity; however, a proper explanation of these analyses is still lacking. While machine learning (ML) has demonstrated potential in supporting PD diagnosis, existing approaches often rely on subjective reports only and lack interpretability for individualized risk estimation. This study proposes SCOPE-PD, an explainable AI-based prediction framework, by integrating subjective and objective assessments to provide personalized health decisions. Subjective and objective clinical assessment data are collected from the Parkinson's Progression Markers Initiative (PPMI) study to construct a multimodal prediction framework. Several ML techniques are applied to these data, and the best ML model is selected to interpret the results. Model interpretability is examined using SHAP-based analysis. The Random Forest algorithm achieves the highest accuracy of 98.66 percent using combined features from both subjective and objective test data. Tremor, bradykinesia, and facial expression are identified as the top three contributing features from the MDS-UPDRS test in the prediction of PD.

CLOct 27, 2024
Improving Speech-based Emotion Recognition with Contextual Utterance Analysis and LLMs

Enshi Zhang, Christian Poellabauer

Speech Emotion Recognition (SER) focuses on identifying emotional states from spoken language. The 2024 IEEE SLT-GenSEC Challenge on Post Automatic Speech Recognition (ASR) Emotion Recognition tasks participants to explore the capabilities of large language models (LLMs) for emotion recognition using only text data. We propose a novel approach that first refines all available transcriptions to ensure data reliability. We then segment each complete conversation into smaller dialogues and use these dialogues as context to predict the emotion of the target utterance within the dialogue. Finally, we investigated different context lengths and prompting techniques to improve prediction accuracy. Our best submission exceeded the baseline by 20% in unweighted accuracy, achieving the best performance in the challenge. All our experiments' codes, prediction results, and log files are publicly available.

AINov 24, 2025
From Wearables to Warnings: Predicting Pain Spikes in Patients with Opioid Use Disorder

Abhay Goyal, Navin Kumar, Kimberly DiMeola et al.

Chronic pain (CP) and opioid use disorder (OUD) are common and interrelated chronic medical conditions. Currently, there is a paucity of evidence-based integrated treatments for CP and OUD among individuals receiving medication for opioid use disorder (MOUD). Wearable devices have the potential to monitor complex patient information and inform treatment development for persons with OUD and CP, including pain variability (e.g., exacerbations of pain or pain spikes) and clinical correlates (e.g., perceived stress). However, the application of large language models (LLMs) with wearable data for understanding pain spikes, remains unexplored. Consequently, the aim of this pilot study was to examine the clinical correlates of pain spikes using a range of AI approaches. We found that machine learning models achieved relatively high accuracy (>0.7) in predicting pain spikes, while LLMs were limited in providing insights on pain spikes. Real-time monitoring through wearable devices, combined with advanced AI models, could facilitate early detection of pain spikes and support personalized interventions that may help mitigate the risk of opioid relapse, improve adherence to MOUD, and enhance the integration of CP and OUD care. Given overall limited LLM performance, these findings highlight the need to develop LLMs which can provide actionable insights in the OUD/CP context.

CLJan 9, 2022
Medication Error Detection Using Contextual Language Models

Yu Jiang, Christian Poellabauer

Medication errors most commonly occur at the ordering or prescribing stage, potentially leading to medical complications and poor health outcomes. While it is possible to catch these errors using different techniques; the focus of this work is on textual and contextual analysis of prescription information to detect and prevent potential medication errors. In this paper, we demonstrate how to use BERT-based contextual language models to detect anomalies in written or spoken text based on a data set extracted from real-world medical data of thousands of patient records. The proposed models are able to learn patterns of text dependency and predict erroneous output based on contextual information such as patient data. The experimental results yield accuracy up to 96.63% for text input and up to 79.55% for speech input, which is satisfactory for most real-world applications.

HCNov 11, 2020
Understanding College Students' Phone Call Behaviors Towards a Sustainable Mobile Health and Wellbeing Solution

Yugyeong Kim, Sudip Vhaduri, Christian Poellabauer

During the transition from high school to on-campus college life, a student leaves home and starts facing enormous life changes, including meeting new people, more responsibilities, being away from family, and academic challenges. These recent changes lead to an elevation of stress and anxiety, affecting a student's health and wellbeing. With the help of smartphones and their rich collection of sensors, we can continuously monitor various factors that affect students' behavioral patterns, such as communication behaviors associated with their health, wellbeing, and academic success. In this work, we try to assess college students' communication patterns (in terms of phone call duration and frequency) that vary across various geographical contexts (e.g., dormitories, classes, dining) during different times (e.g., epochs of a day, days of a week) using visualization techniques. Findings from this work will help foster the design and delivery of smartphone-based health interventions; thereby, help the students adapt to the changes in life.

SDMar 18, 2020
Detecting Replay Attacks Using Multi-Channel Audio: A Neural Network-Based Method

Yuan Gong, Jian Yang, Christian Poellabauer

With the rapidly growing number of security-sensitive systems that use voice as the primary input, it becomes increasingly important to address these systems' potential vulnerability to replay attacks. Previous efforts to address this concern have focused primarily on single-channel audio. In this paper, we introduce a novel neural network-based replay attack detection model that further leverages spatial information of multi-channel audio and is able to significantly improve the replay attack detection performance.

CVAug 31, 2019
Second-order Non-local Attention Networks for Person Re-identification

Bryan, Xia, Yuan Gong et al.

Recent efforts have shown promising results for person re-identification by designing part-based architectures to allow a neural network to learn discriminative representations from semantically coherent parts. Some efforts use soft attention to reallocate distant outliers to their most similar parts, while others adjust part granularity to incorporate more distant positions for learning the relationships. Others seek to generalize part-based methods by introducing a dropout mechanism on consecutive regions of the feature map to enhance distant region relationships. However, only few prior efforts model the distant or non-local positions of the feature map directly for the person re-ID task. In this paper, we propose a novel attention mechanism to directly model long-range relationships via second-order feature statistics. When combined with a generalized DropBlock module, our method performs equally to or better than state-of-the-art results for mainstream person re-identification datasets, including Market1501, CUHK03, and DukeMTMC-reID.

SIAug 6, 2019
The power of dynamic social networks to predict individuals' mental health

Shikang Liu, David Hachen, Omar Lizardo et al.

Precision medicine has received attention both in and outside the clinic. We focus on the latter, by exploiting the relationship between individuals' social interactions and their mental health to develop a predictive model of one's likelihood to be depressed or anxious from rich dynamic social network data. To our knowledge, we are the first to do this. Existing studies differ from our work in at least one aspect: they do not model social interaction data as a network; they do so but analyze static network data; they examine "correlation" between social networks and health but without developing a predictive model; or they study other individual traits but not mental health. In a systematic and comprehensive evaluation, we show that our predictive model that uses dynamic social network data is superior to its static network as well as non-network equivalents when run on the same data.

CRJul 15, 2019
Summary: Multi-modal Biometric-based Implicit Authentication of Wearable Device Users

Sudip Vhaduri, Christian Poellabauer

The Internet of Things (IoT) is increasingly empowering people with an interconnected world of physical objects ranging from smart buildings to portable smart devices such as wearables. With recent advances in mobile sensing, wearables have become a rich collection of portable sensors and are able to provide various types of services including tracking of health and fitness, making financial transactions, and unlocking smart locks and vehicles. Most of these services are delivered based on users' confidential and personal data, which are stored on these wearables. Existing explicit authentication approaches (i.e., PINs or pattern locks) for wearables suffer from several limitations, including small or no displays, risk of shoulder surfing, and users' recall burden. Oftentimes, users completely disable security features out of convenience. Therefore, there is a need for a burden-free (implicit) authentication mechanism for wearable device users based on easily obtainable biometric data. In this paper, we present an implicit wearable device user authentication mechanism using combinations of three types of coarse-grain minute-level biometrics: behavioral (step counts), physiological (heart rate), and hybrid (calorie burn and metabolic equivalent of task). From our analysis of over 400 Fitbit users from a 17-month long health study, we are able to authenticate subjects with average accuracy values of around .93 (sedentary) and .90 (non-sedentary) with equal error rates of .05 using binary SVM classifiers. Our findings also show that the hybrid biometrics perform better than other biometrics and behavioral biometrics do not have a significant impact, even during non-sedentary periods.

SIJun 11, 2019
Heterogeneous network approach to predict individuals' mental health

Shikang Liu, Fatemeh Vahedian, David Hachen et al.

Depression and anxiety are critical public health issues affecting millions of people around the world. To identify individuals who are vulnerable to depression and anxiety, predictive models have been built that typically utilize data from one source. Unlike these traditional models, in this study, we leverage a rich heterogeneous data set from the University of Notre Dame's NetHealth study that collected individuals' (student participants') social interaction data via smartphones, health-related behavioral data via wearables (Fitbit), and trait data from surveys. To integrate the different types of information, we model the NetHealth data as a heterogeneous information network (HIN). Then, we redefine the problem of predicting individuals' mental health conditions (depression or anxiety) in a novel manner, as applying to our HIN a popular paradigm of a recommender system (RS), which is typically used to predict the preference that a person would give to an item (e.g., a movie or book). In our case, the items are the individuals' different mental health states. We evaluate four state-of-the-art RS approaches. Also, we model the prediction of individuals' mental health as another problem type - that of node classification (NC) in our HIN, evaluating in the process four node features under logistic regression as a proof-of-concept classifier. We find that our RS and NC network methods produce more accurate predictions than a logistic regression model using the same NetHealth data in the traditional non-network fashion as well as a random-approach. Also, we find that the best of the considered RS approaches outperforms all considered NC approaches. This is the first study to integrate smartphone, wearable sensor, and survey data in an HIN manner and use RS or NC on the HIN to predict individuals' mental health conditions.

CRMay 31, 2019
Real-Time Adversarial Attacks

Yuan Gong, Boyang Li, Christian Poellabauer et al.

In recent years, many efforts have demonstrated that modern machine learning algorithms are vulnerable to adversarial attacks, where small, but carefully crafted, perturbations on the input can make them fail. While these attack methods are very effective, they only focus on scenarios where the target model takes static input, i.e., an attacker can observe the entire original sample and then add a perturbation at any point of the sample. These attack approaches are not applicable to situations where the target model takes streaming input, i.e., an attacker is only able to observe past data points and add perturbations to the remaining (unobserved) data points of the input. In this paper, we propose a real-time adversarial attack scheme for machine learning models with streaming inputs.

CRApr 6, 2019
ReMASC: Realistic Replay Attack Corpus for Voice Controlled Systems

Yuan Gong, Jian Yang, Jacob Huber et al.

This paper introduces a new database of voice recordings with the goal of supporting research on vulnerabilities and protection of voice-controlled systems (VCSs). In contrast to prior efforts, the proposed database contains both genuine voice commands and replayed recordings of such commands, collected in realistic VCSs usage scenarios and using modern voice assistant development kits. Specifically, the database contains recordings from four systems (each with a different microphone array) in a variety of environmental conditions with different forms of background noise and relative positions between speaker and device. To the best of our knowledge, this is the first publicly available database that has been specifically designed for the protection of state-of-the-art voice-controlled systems against various replay attacks in various conditions and environments.

CRNov 16, 2018
Biometric-Based Wearable User Authentication During Sedentary and Non-sedentary Periods

Sudip Vhaduri, Christian Poellabauer

The Internet of Things (IoT) is increasingly empowering people with an interconnected world of physical objects ranging from smart buildings to portable smart devices such as wearables. With the recent advances in mobile sensing, wearables have become a rich collection of portable sensors and are able to provide various types of services including health and fitness tracking, financial transactions, and unlocking smart locks and vehicles. Existing explicit authentication approaches (i.e., PINs or pattern locks) suffer from several limitations including limited display size, shoulder surfing, and recall burden. Oftentimes, users completely disable security features out of convenience. Therefore, there is a need for a burden-free (implicit) authentication mechanism for wearable device users based on easily obtainable biometric data. In this paper, we present an implicit wearable device user authentication mechanism using combinations of three types of coarse-grained minute-level biometrics: behavioral (step counts), physiological (heart rate), and hybrid (calorie burn and metabolic equivalent of task). From our analysis of 421 Fitbit users from a two-year long health study, we are able to authenticate subjects with average accuracy values of around 92% and 88% during sedentary and non-sedentary periods, respectively. Our findings also show that (a) behavioral biometrics do not work well during sedentary periods and (b) hybrid biometrics typically perform better than other biometrics.

CRNov 16, 2018
Protecting Voice Controlled Systems Using Sound Source Identification Based on Acoustic Cues

Yuan Gong, Christian Poellabauer

Over the last few years, a rapidly increasing number of Internet-of-Things (IoT) systems that adopt voice as the primary user input have emerged. These systems have been shown to be vulnerable to various types of voice spoofing attacks. Existing defense techniques can usually only protect from a specific type of attack or require an additional authentication step that involves another device. Such defense strategies are either not strong enough or lower the usability of the system. Based on the fact that legitimate voice commands should only come from humans rather than a playback device, we propose a novel defense strategy that is able to detect the sound source of a voice command based on its acoustic features. The proposed defense strategy does not require any information other than the voice command itself and can protect a system from multiple types of spoofing attacks. Our proof-of-concept experiments verify the feasibility and effectiveness of this defense strategy.

SDAug 8, 2018
Towards Learning Fine-Grained Disentangled Representations from Speech

Yuan Gong, Christian Poellabauer

Learning disentangled representations of high-dimensional data is currently an active research area. However, compared to the field of computer vision, less work has been done for speech processing. In this paper, we provide a review of two representative efforts on this topic and propose the novel concept of fine-grained disentangled speech representation learning.

CLMar 28, 2018
Topic Modeling Based Multi-modal Depression Detection

Yuan Gong, Christian Poellabauer

Major depressive disorder is a common mental disorder that affects almost 7% of the adult U.S. population. The 2017 Audio/Visual Emotion Challenge (AVEC) asks participants to build a model to predict depression levels based on the audio, video, and text of an interview ranging between 7-33 minutes. Since averaging features over the entire interview will lose most temporal information, how to discover, capture, and preserve useful temporal details for such a long interview are significant challenges. Therefore, we propose a novel topic modeling based approach to perform context-aware analysis of the recording. Our experiments show that the proposed approach outperforms context-unaware methods and the challenge baselines for all metrics.

CRMar 24, 2018
An Overview of Vulnerabilities of Voice Controlled Systems

Yuan Gong, Christian Poellabauer

Over the last few years, a rapidly increasing number of Internet-of-Things (IoT) systems that adopt voice as the primary user input have emerged. These systems have been shown to be vulnerable to various types of voice spoofing attacks. However, how exactly these techniques differ or relate to each other has not been extensively studied. In this paper, we provide a survey of recent attack and defense techniques for voice controlled systems and propose a classification of these techniques. We also discuss the need for a universal defense strategy that protects a system from various types of attacks.

LGNov 9, 2017
Crafting Adversarial Examples For Speech Paralinguistics Applications

Yuan Gong, Christian Poellabauer

Computational paralinguistic analysis is increasingly being used in a wide range of cyber applications, including security-sensitive applications such as speaker verification, deceptive speech detection, and medical diagnostics. While state-of-the-art machine learning techniques, such as deep neural networks, can provide robust and accurate speech analysis, they are susceptible to adversarial attacks. In this work, we propose an end-to-end scheme to generate adversarial examples for computational paralinguistic applications by perturbing directly the raw waveform of an audio recording rather than specific acoustic features. Our experiments show that the proposed adversarial perturbation can lead to a significant performance drop of state-of-the-art deep neural networks, while only minimally impairing the audio quality.