Kleanthis Avramidis

LG
h-index22
21papers
101citations
Novelty47%
AI Score55

21 Papers

LGSep 26, 2023Code
Scaling Representation Learning from Ubiquitous ECG with State-Space Models

Kleanthis Avramidis, Dominika Kunc, Bartosz Perz et al.

Ubiquitous sensing from wearable devices in the wild holds promise for enhancing human well-being, from diagnosing clinical conditions and measuring stress to building adaptive health promoting scaffolds. But the large volumes of data therein across heterogeneous contexts pose challenges for conventional supervised learning approaches. Representation Learning from biological signals is an emerging realm catalyzed by the recent advances in computational modeling and the abundance of publicly shared databases. The electrocardiogram (ECG) is the primary researched modality in this context, with applications in health monitoring, stress and affect estimation. Yet, most studies are limited by small-scale controlled data collection and over-parameterized architecture choices. We introduce \textbf{WildECG}, a pre-trained state-space model for representation learning from ECG signals. We train this model in a self-supervised manner with 275,000 10s ECG recordings collected in the wild and evaluate it on a range of downstream tasks. The proposed model is a robust backbone for ECG analysis, providing competitive performance on most of the tasks considered, while demonstrating efficacy in low-resource regimes. The code and pre-trained weights are shared publicly at https://github.com/klean2050/tiles_ecg_model.

IVJul 10, 2022
Automating Detection of Papilledema in Pediatric Fundus Images with Explainable Machine Learning

Kleanthis Avramidis, Mohammad Rostami, Melinda Chang et al.

Papilledema is an ophthalmic neurologic disorder in which increased intracranial pressure leads to swelling of the optic nerves. Undiagnosed papilledema in children may lead to blindness and may be a sign of life-threatening conditions, such as brain tumors. Robust and accurate clinical diagnosis of this syndrome can be facilitated by automated analysis of fundus images using deep learning, especially in the presence of challenges posed by pseudopapilledema that has similar fundus appearance but distinct clinical implications. We present a deep learning-based algorithm for the automatic detection of pediatric papilledema. Our approach is based on optic disc localization and detection of explainable papilledema indicators through data augmentation. Experiments on real-world clinical data demonstrate that our proposed method is effective with a diagnostic accuracy comparable to expert ophthalmologists.

SPApr 17, 2023
Signal Processing Grand Challenge 2023 -- e-Prevention: Sleep Behavior as an Indicator of Relapses in Psychotic Patients

Kleanthis Avramidis, Kranti Adsul, Digbalay Bose et al.

This paper presents the approach and results of USC SAIL's submission to the Signal Processing Grand Challenge 2023 - e-Prevention (Task 2), on detecting relapses in psychotic patients. Relapse prediction has proven to be challenging, primarily due to the heterogeneity of symptoms and responses to treatment between individuals. We address these challenges by investigating the use of sleep behavior features to estimate relapse days as outliers in an unsupervised machine learning setting. We extract informative features from human activity and heart rate data collected in the wild, and evaluate various combinations of feature types and time resolutions. We found that short-time sleep behavior features outperformed their awake counterparts and larger time intervals. Our submission was ranked 3rd in the Task's official leaderboard, demonstrating the potential of such features as an objective and non-invasive predictor of psychotic relapses.

SPNov 15, 2025Code
Informed Bootstrap Augmentation Improves EEG Decoding

Woojae Jeong, Wenhui Cui, Kleanthis Avramidis et al.

Electroencephalography (EEG) offers detailed access to neural dynamics but remains constrained by noise and trial-by-trial variability, limiting decoding performance in data-restricted or complex paradigms. Data augmentation is often employed to enhance feature representations, yet conventional uniform averaging overlooks differences in trial informativeness and can degrade representational quality. We introduce a weighted bootstrapping approach that prioritizes more reliable trials to generate higher-quality augmented samples. In a Sentence Evaluation paradigm, weights were computed from relative ERP differences and applied during probabilistic sampling and averaging. Across conditions, weighted bootstrapping improved decoding accuracy relative to unweighted (from 68.35% to 71.25% at best), demonstrating that emphasizing reliable trials strengthens representational quality. The results demonstrate that reliability-based augmentation yields more robust and discriminative EEG representations. The code is publicly available at https://github.com/lyricists/NeuroBootstrap.

63.8LGMay 27
A Multi-dimensional Framework for Evaluating Generalization in EEG Foundation Models

Aditya Kommineni, Emily Zhou, Kleanthis Avramidis et al.

Evaluating foundation models under appropriate adaptation settings is essential for understanding the quality and transferability of the learned representations. Recent EEG foundation models have demonstrated promising transfer capabilities across tasks and datasets, motivating their growing use in neurotechnology and clinical applications. However, these models are typically evaluated under full fine-tuning on well-curated downstream datasets, a setting that does not reflect biomedical domain constraints such as limited labeled data, reduced sensor coverage, or parameter-efficient adaptation. In this work, we propose a multi-dimensional evaluation framework for assessing EEG models under realistic low-resource conditions. Empirical analysis of both supervised EEG models and recent EEG foundation models, including LaBraM, CSBrain, and CBraMod, across 6 different datasets is performed under the proposed multi-dimensional evaluation framework. We find that EEG foundation models consistently provide performance gains on long-context tasks such as sleep stage prediction and mental health state classification. In contrast, for short-window Brain Computer Interface style tasks, supervised models achieve comparable despite having substantially fewer parameters. Additional analyses demonstrate that current foundation models provide limited robustness to short-window tasks and channel constrained settings. Together, these findings motivate the use of multi-dimensional evaluation protocols that characterize model behavior under realistic use constraints.

80.0LGMay 26
Aperiodic and Low-Frequency Spectral Bias in Reconstruction based EEG Foundation Models

Aditya Kommineni, Emily Zhou, Kleanthis Avramidis et al.

EEG foundation models, pre-trained on large-scale unlabelled EEG data, have emerged as a promising direction towards learning generalizable EEG representations. Despite showing positive results in data-rich regimes, they often fail to outperform significantly smaller supervised models in low-resource settings compared to fully supervised models. We provide a mechanistic account of this shortcoming, attributing it to a fundamental mismatch between reconstruction-based pretext tasks and the idiosyncratic spectral structure of EEG signals, which decompose into distinct high-power aperiodic and low-power oscillatory components. Using controlled, synthetically-generated EEG inputs, we demonstrate that EEG foundation model embeddings are biased to capture the aperiodic components of the EEG signal while under-representing oscillatory components, particularly at higher frequencies. Additionally, linear probe evaluations on real-world BCI datasets further reveal that embeddings encode subject identity more strongly than task-relevant information, thereby reinforcing the low-frequency and aperiodic component bias in foundation model embeddings trained primarily on reconstruction based objectives. Together, these findings elucidate a failure mode in reconstruction based EEG foundation models and motivate future work to incorporate auxiliary losses explicitly targeting high-frequency oscillatory structure as a path toward more capable and generalizable EEG representations.

50.4MMApr 21
Smiling Regulates Emotion During Traumatic Recollection

Marcus Ma, Emily Zhou, Leonard Ludwig et al.

We study when, where, and why 978 Holocaust survivors smile in video testimonies. We create an automatic smile detection model from facial features with an F1 of 85% and annotate detected smiles under two established taxonomies of smiling. We produce narrative features on 1,083,417 transcript sentences as well as emotional valence from three different modalities: audio, eye gaze, and text transcript. Smiling rates are significantly correlated with specific semantic topics, narrative structures, and temporal syntaxes across the entire corpus. Smiles often occur during periods of intense negative affect; these negative-affect smiles improve the valence trajectory of surrounding sentences significantly across all three modalities. Smiling reduces eye dynamics and blink rates, and the strength of both of these effects is also modulated by narrative valence. Taken together, we conclude that smiling plays a critical role in regulating emotion and social interaction during traumatic recollection.

43.5ASMar 12
Affect Decoding in Phonated and Silent Speech Production from Surface EMG

Simon Pistrosch, Kleanthis Avramidis, Tiantian Feng et al.

The expression of affect is integral to spoken communication, yet, its link to underlying articulatory execution remains unclear. Measures of articulatory muscle activity such as EMG could reveal how speech production is modulated by emotion alongside acoustic speech analyses. We investigate affect decoding from facial and neck surface electromyography (sEMG) during phonated and silent speech production. For this purpose, we introduce a dataset comprising 2,780 utterances from 12 participants across 3 tasks, on which we evaluate both intra- and inter-subject decoding using a range of features and model embeddings. Our results reveal that EMG representations reliably discriminate frustration with up to 0.845 AUC, and generalize well across articulation modes. Our ablation study further demonstrates that affective signatures are embedded in facial motor activity and persist in the absence of phonation, highlighting the potential of EMG sensing for affect-aware silent speech interfaces.

38.1SDMar 11
VoxCare: Studying Natural Communication Behaviors of Hospital Caregivers through Wearable Sensing of Egocentric Audio

Tiantian Feng, Kleanthis Avramidis, Anfeng Xu et al.

Healthcare professionals work in complex, high-stakes environments where effective communication is critical for care delivery, team coordination, and individual well-being. However, communication activity in everyday clinical settings remains challenging to measure and largely unexplored in human behavioral research. We present VoxCare, a scalable egocentric wearable audio sensing and computing system that captures natural communication behaviors of hospital professionals in real-world settings without storing raw audio. VoxCare performs real-time, on-device acoustic feature extraction and applies a speech foundation model-guided teacher-student framework to identify foreground speech activity. From these features, VoxCare derives interpretable behavioral measures of communication frequency, duration, and vocal arousal. Our analyses reveal how, when, and how often clinicians communicate across different shifts and working units, and suggest that communication activity reflects underlying workload and stress. By enabling continuous assessment of communication patterns in everyday contexts, this study provides data-driven approaches to understand the behaviors of healthcare providers and ultimately improve healthcare delivery.

22.1MMApr 23
Looking Into the Past: Eye Movements Characterize Elements of Autobiographical Recall in Interviews with Holocaust Survivors

Emily Zhou, Marcus Ma, Kleanthis Avramidis et al.

Eye movement and memory retrieval are deeply and bidirectionally intertwined, however existing literature is generally confined to controlled lab settings. We investigate the relationship between eye gaze and memory recall in free-form autobiographical recall, which comprises both autonoetic consciousness -- the ability to mentally place oneself in the past or future -- and various affective states. Using a large video corpus of semi-naturalistic interviews with Holocaust survivors (N = 806), we examine eye movements with respect to episodic, semantic, affective, and temporal dimensions of traumatic and highly emotional autobiographical recall. We observe gaze patterns vary significantly across certain temporal contexts, most prominently in vertical eye movements. We additionally train intra-subject sequence models to predict temporal context of sentences from segments of gaze features, and find that eye movements entirely preceding sentence onset are sufficient for prediction. Our results corroborate prior findings in literature linking eye movements to memory in controlled and semi-structured settings, reinforcing the role of eye gaze in retrieving and constructing memories, especially in highly emotional and remote memory recall.

CVJul 20, 2024
Early Detection of Coffee Leaf Rust Through Convolutional Neural Networks Trained on Low-Resolution Images

Angelly Cabrera, Kleanthis Avramidis, Shrikanth Narayanan

Coffee leaf rust, a foliar disease caused by the fungus Hemileia vastatrix, poses a major threat to coffee production, especially in Central America. Climate change further aggravates this issue, as it shortens the latency period between initial infection and the emergence of visible symptoms in diseases like leaf rust. Shortened latency periods can lead to more severe plant epidemics and faster spread of diseases. There is, hence, an urgent need for effective disease management strategies. To address these challenges, we explore the potential of deep learning models for enhancing early disease detection. However, deep learning models require extensive processing power and large amounts of data for model training, resources that are typically scarce. To overcome these barriers, we propose a preprocessing technique that involves convolving training images with a high-pass filter to enhance lesion-leaf contrast, significantly improving model efficacy in resource-limited environments. This method and our model demonstrated a strong performance, achieving over 90% across all evaluation metrics--including precision, recall, F1-score, and the Dice coefficient. Our experiments show that this approach outperforms other methods, including two different image preprocessing techniques and using unaltered, full-color images.

SPJun 12, 2024Code
Toward Fully-End-to-End Listened Speech Decoding from EEG Signals

Jihwan Lee, Aditya Kommineni, Tiantian Feng et al.

Speech decoding from EEG signals is a challenging task, where brain activity is modeled to estimate salient characteristics of acoustic stimuli. We propose FESDE, a novel framework for Fully-End-to-end Speech Decoding from EEG signals. Our approach aims to directly reconstruct listened speech waveforms given EEG signals, where no intermediate acoustic feature processing step is required. The proposed method consists of an EEG module and a speech module along with a connector. The EEG module learns to better represent EEG signals, while the speech module generates speech waveforms from model representations. The connector learns to bridge the distributions of the latent spaces of EEG and speech. The proposed framework is both simple and efficient, by allowing single-step inference, and outperforms prior works on objective metrics. A fine-grained phoneme analysis is conducted to unveil model characteristics of speech decoding. The source code is available here: github.com/lee-jhwn/fesde.

LGFeb 15, 2024
Knowledge-guided EEG Representation Learning

Aditya Kommineni, Kleanthis Avramidis, Richard Leahy et al.

Self-supervised learning has produced impressive results in multimedia domains of audio, vision and speech. This paradigm is equally, if not more, relevant for the domain of biosignals, owing to the scarcity of labelled data in such scenarios. The ability to leverage large-scale unlabelled data to learn robust representations could help improve the performance of numerous inference tasks on biosignals. Given the inherent domain differences between multimedia modalities and biosignals, the established objectives for self-supervised learning may not translate well to this domain. Hence, there is an unmet need to adapt these methods to biosignal analysis. In this work we propose a self-supervised model for EEG, which provides robust performance and remarkable parameter efficiency by using state space-based deep learning architecture. We also propose a novel knowledge-guided pre-training objective that accounts for the idiosyncrasies of the EEG signal. The results indicate improved embedding representation learning and downstream performance compared to prior works on exemplary tasks. Also, the proposed objective significantly reduces the amount of pre-training data required to obtain performance equivalent to prior works.

ASMay 20, 2025
Articulatory Feature Prediction from Surface EMG during Speech Production

Jihwan Lee, Kevin Huang, Kleanthis Avramidis et al.

We present a model for predicting articulatory features from surface electromyography (EMG) signals during speech production. The proposed model integrates convolutional layers and a Transformer block, followed by separate predictors for articulatory features. Our approach achieves a high prediction correlation of approximately 0.9 for most articulatory features. Furthermore, we demonstrate that these predicted articulatory features can be decoded into intelligible speech waveforms. To our knowledge, this is the first method to decode speech waveforms from surface EMG via articulatory features, offering a novel approach to EMG-based speech synthesis. Additionally, we analyze the relationship between EMG electrode placement and articulatory feature predictability, providing knowledge-driven insights for optimizing EMG electrode configurations. The source code and decoded speech samples are publicly available.

NCJul 30, 2025
Time-Resolved EEG Decoding of Semantic Processing Reveals Altered Neural Dynamics in Depression and Suicidality

Woojae Jeong, Aditya Kommineni, Kleanthis Avramidis et al.

Depression and suicidality affect cognitive and emotional processes, yet objective, task-evoked neural readouts of mental health remain limited. We investigated the spatiotemporal dynamics of affective semantic processing using multivariate decoding of time-resolved, 64-channel electroencephalography (EEG). Participants (N=137) performed a sentence-evaluation task with emotionally salient, self-referential statements. We identified robust neural signatures of semantic processing, with peak decoding accuracy between 300-600 ms -- a window associated with rapid, stimulus-driven semantic evaluation and conflict monitoring. Relative to healthy controls, individuals with depression and suicidal ideation showed earlier onset, longer duration, and greater amplitude decoding responses, along with broader cross-temporal generalization and enhanced contributions from frontocentral and parietotemporal components. These findings suggest altered sensitivity and impaired disengagement from emotionally salient content in the clinical groups, advancing our understanding of the neurocognitive basis of mental health and establishing a compact and interpretable EEG-based index of semantic-evaluation dynamics with potential diagnostic relevance.

LGOct 10, 2025
Neural Codecs as Biosignal Tokenizers

Kleanthis Avramidis, Tiantian Feng, Woojae Jeong et al.

Neurophysiological recordings such as electroencephalography (EEG) offer accessible and minimally invasive means of estimating physiological activity for applications in healthcare, diagnostic screening, and even immersive entertainment. However, these recordings yield high-dimensional, noisy time-series data that typically require extensive pre-processing and handcrafted feature extraction to reveal meaningful information. Recently, there has been a surge of interest in applying representation learning techniques from large pre-trained (foundation) models to effectively decode and interpret biosignals. We discuss the challenges posed for incorporating such methods and introduce BioCodec, an alternative representation learning framework inspired by neural codecs to capture low-level signal characteristics in the form of discrete tokens. Pre-trained on thousands of EEG hours, BioCodec shows efficacy across multiple downstream tasks, ranging from clinical diagnostic tasks and sleep physiology to decoding speech and motor imagery, particularly in low-resource settings. Additionally, we provide a qualitative analysis of codebook usage and estimate the spatial coherence of codebook embeddings from EEG connectivity. Notably, we also document the suitability of our method to other biosignal data, i.e., electromyographic (EMG) signals. Overall, the proposed approach provides a versatile solution for biosignal tokenization that performs competitively with state-of-the-art models. The source code and model checkpoints are shared.

LGApr 29, 2025
Deep Learning Characterizes Depression and Suicidal Ideation from Eye Movements

Kleanthis Avramidis, Woojae Jeong, Aditya Kommineni et al.

Identifying physiological and behavioral markers for mental health conditions is a longstanding challenge in psychiatry. Depression and suicidal ideation, in particular, lack objective biomarkers, with screening and diagnosis primarily relying on self-reports and clinical interviews. Here, we investigate eye tracking as a potential marker modality for screening purposes. Eye movements are directly modulated by neuronal networks and have been associated with attentional and mood-related patterns; however, their predictive value for depression and suicidality remains unclear. We recorded eye-tracking sequences from 126 young adults as they read and responded to affective sentences, and subsequently developed a deep learning framework to predict their clinical status. The proposed model included separate branches for trials of positive and negative sentiment, and used 2D time-series representations to account for both intra-trial and inter-trial variations. We were able to identify depression and suicidal ideation with an area under the receiver operating curve (AUC) of 0.793 (95% CI: 0.765-0.819) against healthy controls, and suicidality specifically with 0.826 AUC (95% CI: 0.797-0.852). The model also exhibited moderate, yet significant, accuracy in differentiating depressed from suicidal participants, with 0.609 AUC (95% CI 0.571-0.646). Discriminative patterns emerge more strongly when assessing the data relative to response generation than relative to the onset time of the final word of the sentences. The most pronounced effects were observed for negative-sentiment sentences, that are congruent to depressed and suicidal participants. Our findings highlight eye tracking as an objective tool for mental health assessment and underscore the modulatory impact of emotional stimuli on cognitive processes affecting oculomotor control.

SDFeb 20, 2022
Enhancing Affective Representations of Music-Induced EEG through Multimodal Supervision and latent Domain Adaptation

Kleanthis Avramidis, Christos Garoufis, Athanasia Zlatintsi et al.

The study of Music Cognition and neural responses to music has been invaluable in understanding human emotions. Brain signals, though, manifest a highly complex structure that makes processing and retrieving meaningful features challenging, particularly of abstract constructs like affect. Moreover, the performance of learning models is undermined by the limited amount of available neuronal data and their severe inter-subject variability. In this paper we extract efficient, personalized affective representations from EEG signals during music listening. To this end, we employ music signals as a supervisory modality to EEG, aiming to project their semantic correspondence onto a common representation space. We utilize a bi-modal framework by combining an LSTM-based attention model to process EEG and a pre-trained model for music tagging, along with a reverse domain discriminator to align the distributions of the two modalities, further constraining the learning process with emotion tags. The resulting framework can be utilized for emotion recognition both directly, by performing supervised predictions from either modality, and indirectly, by providing relevant music samples to EEG input queries. The experimental findings show the potential of enhancing neuronal data through stimulus information for recognition purposes and yield insights into the distribution and temporal variance of music-induced affective features.

SDFeb 13, 2021
Deep Convolutional and Recurrent Networks for Polyphonic Instrument Classification from Monophonic Raw Audio Waveforms

Kleanthis Avramidis, Agelos Kratimenos, Christos Garoufis et al.

Sound Event Detection and Audio Classification tasks are traditionally addressed through time-frequency representations of audio signals such as spectrograms. However, the emergence of deep neural networks as efficient feature extractors has enabled the direct use of audio signals for classification purposes. In this paper, we attempt to recognize musical instruments in polyphonic audio by only feeding their raw waveforms into deep learning models. Various recurrent and convolutional architectures incorporating residual connections are examined and parameterized in order to build end-to-end classi-fiers with low computational cost and only minimal preprocessing. We obtain competitive classification scores and useful instrument-wise insight through the IRMAS test set, utilizing a parallel CNN-BiGRU model with multiple residual connections, while maintaining a significantly reduced number of trainable parameters.

SPOct 30, 2020
Multiscale Fractal Analysis on EEG Signals for Music-Induced Emotion Recognition

Kleanthis Avramidis, Athanasia Zlatintsi, Christos Garoufis et al.

Emotion Recognition from EEG signals has long been researched as it can assist numerous medical and rehabilitative applications. However, their complex and noisy structure has proven to be a serious barrier for traditional modeling methods. In this paper, we employ multifractal analysis to examine the behavior of EEG signals in terms of presence of fluctuations and the degree of fragmentation along their major frequency bands, for the task of emotion recognition. In order to extract emotion-related features we utilize two novel algorithms for EEG analysis, based on Multiscale Fractal Dimension and Multifractal Detrended Fluctuation Analysis. The proposed feature extraction methods perform efficiently, surpassing some widely used baseline features on the competitive DEAP dataset, indicating that multifractal analysis could serve as basis for the development of robust models for affective state recognition.

LGNov 28, 2019
Augmentation Methods on Monophonic Audio for Instrument Classification in Polyphonic Music

Agelos Kratimenos, Kleanthis Avramidis, Christos Garoufis et al.

Instrument classification is one of the fields in Music Information Retrieval (MIR) that has attracted a lot of research interest. However, the majority of that is dealing with monophonic music, while efforts on polyphonic material mainly focus on predominant instrument recognition. In this paper, we propose an approach for instrument classification in polyphonic music from purely monophonic data, that involves performing data augmentation by mixing different audio segments. A variety of data augmentation techniques focusing on different sonic aspects, such as overlaying audio segments of the same genre, as well as pitch and tempo-based synchronization, are explored. We utilize Convolutional Neural Networks for the classification task, comparing shallow to deep network architectures. We further investigate the usage of a combination of the above classifiers, each trained on a single augmented dataset. An ensemble of VGG-like classifiers, trained on non-augmented, pitch-synchronized, tempo-synchronized and genre-similar excerpts, respectively, yields the best results, achieving slightly above 80% in terms of label ranking average precision (LRAP) in the IRMAS test set.ruments in over 2300 testing tracks.