Seo-Hyun Lee

HC
h-index9
17papers
171citations
Novelty42%
AI Score41

17 Papers

ASJul 26, 2023
Diff-E: Diffusion-based Learning for Decoding Imagined Speech EEG

Soowon Kim, Young-Eun Lee, Seo-Hyun Lee et al.

Decoding EEG signals for imagined speech is a challenging task due to the high-dimensional nature of the data and low signal-to-noise ratio. In recent years, denoising diffusion probabilistic models (DDPMs) have emerged as promising approaches for representation learning in various domains. Our study proposes a novel method for decoding EEG signals for imagined speech using DDPMs and a conditional autoencoder named Diff-E. Results indicate that Diff-E significantly improves the accuracy of decoding EEG signals for imagined speech compared to traditional machine learning techniques and baseline models. Our findings suggest that DDPMs can be an effective tool for EEG signal decoding, with potential implications for the development of brain-computer interfaces that enable communication through imagined speech.

CLNov 14, 2023
Brain-Driven Representation Learning Based on Diffusion Model

Soowon Kim, Seo-Hyun Lee, Young-Eun Lee et al.

Interpreting EEG signals linked to spoken language presents a complex challenge, given the data's intricate temporal and spatial attributes, as well as the various noise factors. Denoising diffusion probabilistic models (DDPMs), which have recently gained prominence in diverse areas for their capabilities in representation learning, are explored in our research as a means to address this issue. Using DDPMs in conjunction with a conditional autoencoder, our new approach considerably outperforms traditional machine learning algorithms and established baseline models in accuracy. Our results highlight the potential of DDPMs as a sophisticated computational method for the analysis of speech-related EEG signals. This could lead to significant advances in brain-computer interfaces tailored for spoken communication.

HCJan 19, 2023
Subject-Independent Classification of Brain Signals using Skip Connections

Soowon Kim, Ji-Won Lee, Young-Eun Lee et al.

Untapped potential for new forms of human-to-human communication can be found in the active research field of studies on the decoding of brain signals of human speech. A brain-computer interface system can be implemented using electroencephalogram signals because it poses more less clinical risk and can be acquired using portable instruments. One of the most interesting tasks for the brain-computer interface system is decoding words from the raw electroencephalogram signals. Before a brain-computer interface may be used by a new user, current electroencephalogram-based brain-computer interface research typically necessitates a subject-specific adaption stage. In contrast, the subject-independent situation is one that is highly desired since it allows a well-trained model to be applied to new users with little or no precalibration. The emphasis is on creating an efficient decoder that may be employed adaptively in subject-independent circumstances in light of this crucial characteristic. Our proposal is to explicitly apply skip connections between convolutional layers to enable the flow of mutual information between layers. To do this, we add skip connections between layers, allowing the mutual information to flow throughout the layers. The output of the encoder is then passed through the fully-connected layer to finally represent the probabilities of the 13 classes. In this study, overt speech was used to record the electroencephalogram data of 16 participants. The results show that when the skip connection is present, the classification performance improves notably.

HCOct 31, 2025
Reconstructing Unseen Sentences from Speech-related Biosignals for Open-vocabulary Neural Communication

Deok-Seon Kim, Seo-Hyun Lee, Kang Yin et al.

Brain-to-speech (BTS) systems represent a groundbreaking approach to human communication by enabling the direct transformation of neural activity into linguistic expressions. While recent non-invasive BTS studies have largely focused on decoding predefined words or sentences, achieving open-vocabulary neural communication comparable to natural human interaction requires decoding unconstrained speech. Additionally, effectively integrating diverse signals derived from speech is crucial for developing personalized and adaptive neural communication and rehabilitation solutions for patients. This study investigates the potential of speech synthesis for previously unseen sentences across various speech modes by leveraging phoneme-level information extracted from high-density electroencephalography (EEG) signals, both independently and in conjunction with electromyography (EMG) signals. Furthermore, we examine the properties affecting phoneme decoding accuracy during sentence reconstruction and offer neurophysiological insights to further enhance EEG decoding for more effective neural communication solutions. Our findings underscore the feasibility of biosignal-based sentence-level speech synthesis for reconstructing unseen sentences, highlighting a significant step toward developing open-vocabulary neural communication systems adapted to diverse patient needs and conditions. Additionally, this study provides meaningful insights into the development of communication and rehabilitation solutions utilizing EEG-based decoding technologies.

NCNov 11, 2025
Subject-Independent Imagined Speech Detection via Cross-Subject Generalization and Calibration

Byung-Kwan Ko, Soowon Kim, Seo-Hyun Lee

Achieving robust generalization across individuals remains a major challenge in electroencephalogram based imagined speech decoding due to substantial variability in neural activity patterns. This study examined how training dynamics and lightweight subject specific adaptation influence cross subject performance in a neural decoding framework. A cyclic inter subject training approach, involving shorter per subject training segments and frequent alternation among subjects, led to modest yet consistent improvements in decoding performance across unseen target data. Furthermore, under the subject calibrated leave one subject out scheme, incorporating only 10 % of the target subjects data for calibration achieved an accuracy of 0.781 and an AUC of 0.801, demonstrating the effectiveness of few shot adaptation. These findings suggest that integrating cyclic training with minimal calibration provides a simple and effective strategy for developing scalable, user adaptive brain computer interface systems that balance generalization and personalization.

AINov 11, 2025
Confidence-Aware Neural Decoding of Overt Speech from EEG: Toward Robust Brain-Computer Interfaces

Soowon Kim, Byung-Kwan Ko, Seo-Hyun Lee

Non-invasive brain-computer interfaces that decode spoken commands from electroencephalogram must be both accurate and trustworthy. We present a confidence-aware decoding framework that couples deep ensembles of compact, speech-oriented convolutional networks with post-hoc calibration and selective classification. Uncertainty is quantified using ensemble-based predictive entropy, top-two margin, and mutual information, and decisions are made with an abstain option governed by an accuracy-coverage operating point. The approach is evaluated on a multi-class overt speech dataset using a leakage-safe, block-stratified split that respects temporal contiguity. Compared with widely used baselines, the proposed method yields more reliable probability estimates, improved selective performance across operating points, and balanced per-class acceptance. These results suggest that confidence-aware neural decoding can provide robust, deployment-oriented behavior for real-world brain-computer interface communication systems.

NCJan 9, 2025
Towards Dynamic Neural Communication and Speech Neuroprosthesis Based on Viseme Decoding

Ji-Ha Park, Seo-Hyun Lee, Soowon Kim et al.

Decoding text, speech, or images from human neural signals holds promising potential both as neuroprosthesis for patients and as innovative communication tools for general users. Although neural signals contain various information on speech intentions, movements, and phonetic details, generating informative outputs from them remains challenging, with mostly focusing on decoding short intentions or producing fragmented outputs. In this study, we developed a diffusion model-based framework to decode visual speech intentions from speech-related non-invasive brain signals, to facilitate face-to-face neural communication. We designed an experiment to consolidate various phonemes to train visemes of each phoneme, aiming to learn the representation of corresponding lip formations from neural signals. By decoding visemes from both isolated trials and continuous sentences, we successfully reconstructed coherent lip movements, effectively bridging the gap between brain signals and dynamic visual interfaces. The results highlight the potential of viseme decoding and talking face reconstruction from human neural signals, marking a significant step toward dynamic neural communication systems and speech neuroprosthesis for patients.

AINov 14, 2024
Towards Unified Neural Decoding of Perceived, Spoken and Imagined Speech from EEG Signals

Jung-Sun Lee, Ha-Na Jo, Seo-Hyun Lee

Brain signals accompany various information relevant to human actions and mental imagery, making them crucial to interpreting and understanding human intentions. Brain-computer interface technology leverages this brain activity to generate external commands for controlling the environment, offering critical advantages to individuals with paralysis or locked-in syndrome. Within the brain-computer interface domain, brain-to-speech research has gained attention, focusing on the direct synthesis of audible speech from brain signals. Most current studies decode speech from brain activity using invasive techniques and emphasize spoken speech data. However, humans express various speech states, and distinguishing these states through non-invasive approaches remains a significant yet challenging task. This research investigated the effectiveness of deep learning models for non-invasive-based neural signal decoding, with an emphasis on distinguishing between different speech paradigms, including perceived, overt, whispered, and imagined speech, across multiple frequency bands. The model utilizing the spatial conventional neural network module demonstrated superior performance compared to other models, especially in the gamma band. Additionally, imagined speech in the theta frequency band, where deep learning also showed strong effects, exhibited statistically significant differences compared to the other speech paradigms.

LGNov 14, 2024
Towards Scalable Handwriting Communication via EEG Decoding and Latent Embedding Integration

Jun-Young Kim, Deok-Seon Kim, Seo-Hyun Lee

In recent years, brain-computer interfaces have made advances in decoding various motor-related tasks, including gesture recognition and movement classification, utilizing electroencephalogram (EEG) data. These developments are fundamental in exploring how neural signals can be interpreted to recognize specific physical actions. This study centers on a written alphabet classification task, where we aim to decode EEG signals associated with handwriting. To achieve this, we incorporate hand kinematics to guide the extraction of the consistent embeddings from high-dimensional neural recordings using auxiliary variables (CEBRA). These CEBRA embeddings, along with the EEG, are processed by a parallel convolutional neural network model that extracts features from both data sources simultaneously. The model classifies nine different handwritten characters, including symbols such as exclamation marks and commas, within the alphabet. We evaluate the model using a quantitative five-fold cross-validation approach and explore the structure of the embedding space through visualizations. Our approach achieves a classification accuracy of 91 % for the nine-class task, demonstrating the feasibility of fine-grained handwriting decoding from EEG.

AINov 14, 2024
Imagined Speech and Visual Imagery as Intuitive Paradigms for Brain-Computer Interfaces

Seo-Hyun Lee, Ji-Ha Park, Deok-Seon Kim

Brain-computer interfaces (BCIs) have shown promise in enabling communication for individuals with motor impairments. Recent advancements like brain-to-speech technology aim to reconstruct speech from neural activity. However, decoding communication-related paradigms, such as imagined speech and visual imagery, using non-invasive techniques remains challenging. This study analyzes brain dynamics in these two paradigms by examining neural synchronization and functional connectivity through phase-locking values (PLV) in EEG data from 16 participants. Results show that visual imagery produces higher PLV values in visual cortex, engaging spatial networks, while imagined speech demonstrates consistent synchronization, primarily engaging language-related regions. These findings suggest that imagined speech is suitable for language-driven BCI applications, while visual imagery can complement BCI systems for users with speech impairments. Personalized calibration is crucial for optimizing BCI performance.

AINov 14, 2024
Dynamic Neural Communication: Convergence of Computer Vision and Brain-Computer Interface

Ji-Ha Park, Seo-Hyun Lee, Soowon Kim et al.

Interpreting human neural signals to decode static speech intentions such as text or images and dynamic speech intentions such as audio or video is showing great potential as an innovative communication tool. Human communication accompanies various features, such as articulatory movements, facial expressions, and internal speech, all of which are reflected in neural signals. However, most studies only generate short or fragmented outputs, while providing informative communication by leveraging various features from neural signals remains challenging. In this study, we introduce a dynamic neural communication method that leverages current computer vision and brain-computer interface technologies. Our approach captures the user's intentions from neural signals and decodes visemes in short time steps to produce dynamic visual outputs. The results demonstrate the potential to rapidly capture and reconstruct lip movements during natural speech attempts from human neural signals, enabling dynamic neural communication through the convergence of computer vision and brain--computer interface.

AIDec 10, 2023
Neural Speech Embeddings for Speech Synthesis Based on Deep Generative Networks

Seo-Hyun Lee, Young-Eun Lee, Soowon Kim et al.

Brain-to-speech technology represents a fusion of interdisciplinary applications encompassing fields of artificial intelligence, brain-computer interfaces, and speech synthesis. Neural representation learning based intention decoding and speech synthesis directly connects the neural activity to the means of human linguistic communication, which may greatly enhance the naturalness of communication. With the current discoveries on representation learning and the development of the speech synthesis technologies, direct translation of brain signals into speech has shown great promise. Especially, the processed input features and neural speech embeddings which are given to the neural network play a significant role in the overall performance when using deep generative models for speech generation from brain signals. In this paper, we introduce the current brain-to-speech technology with the possibility of speech synthesis from brain signals, which may ultimately facilitate innovation in non-verbal communication. Also, we perform comprehensive analysis on the neural features and neural speech embeddings underlying the neurophysiological activation while performing speech, which may play a significant role in the speech synthesis works.

HCDec 16, 2021
Toward Imagined Speech based Smart Communication System: Potential Applications on Metaverse Conditions

Seo-Hyun Lee, Young-Eun Lee, Seong-Whan Lee

Metaverse provides an alternative platform for human interaction in the virtual world. Since virtual platform holds few restrictions in changing the surrounding environments or the appearance of the avatars, it can serve as a platform that reflects human thoughts or even dreams at least in the metaverse world. When it is merged together with the current brain-computer interface (BCI) technology, which enables system control via brain signals, a new paradigm of human interaction through mind may be established in the metaverse conditions. Recent BCI systems are aiming to provide user-friendly and intuitive means of communication using brain signals. Imagined speech has become an alternative neuro-paradigm for communicative BCI since it relies directly on a person's speech production process, rather than using speech-unrelated neural activity as the means of communication. In this paper, we propose a brain-to-speech (BTS) system for real-world smart communication using brain signals. Also, we show a demonstration of imagined speech based smart home control through communication with a virtual assistant, which can be one of the future applications of brain-metaverse system. We performed pseudo-online analysis using imagined speech electroencephalography data of nine subjects to investigate the potential use of virtual BTS system in the real-world. Average accuracy of 46.54 % (chance level = 7.7 %) and 75.56 % (chance level = 50 %) was acquired in the thirteen-class and binary pseudo-online analysis, respectively. Our results support the potential of imagined speech based smart communication to be applied in the metaverse world.

HCDec 15, 2021
EEG-Transformer: Self-attention from Transformer Architecture for Decoding EEG of Imagined Speech

Young-Eun Lee, Seo-Hyun Lee

Transformers are groundbreaking architectures that have changed a flow of deep learning, and many high-performance models are developing based on transformer architectures. Transformers implemented only with attention with encoder-decoder structure following seq2seq without using RNN, but had better performance than RNN. Herein, we investigate the decoding technique for electroencephalography (EEG) composed of self-attention module from transformer architecture during imagined speech and overt speech. We performed classification of nine subjects using convolutional neural network based on EEGNet that captures temporal-spectral-spatial features from EEG of imagined speech and overt speech. Furthermore, we applied the self-attention module to decoding EEG to improve the performance and lower the number of parameters. Our results demonstrate the possibility of decoding brain activities of imagined speech and overt speech using attention modules. Also, only single channel EEG or ear-EEG can be used to decode the imagined speech for practical BCIs.

HCMay 31, 2021
Voice of Your Brain: Cognitive Representations of Imagined Speech,Overt Speech, and Speech Perception Based on EEG

Seo-Hyun Lee, Young-Eun Lee, Seong-Whan Lee

Every people has their own voice, likewise, brain signals dis-play distinct neural representations for each individual. Al-though recent studies have revealed the robustness of speech-related paradigms for efficient brain-computer interface, the dis-tinction on their cognitive representations with practical usabil-ity still remains to be discovered. Herein, we investigate the dis-tinct brain patterns from electroencephalography (EEG) duringimagined speech, overt speech, and speech perception in termsof subject variations with its practical use of speaker identifica-tion from single channel EEG. We performed classification ofnine subjects using deep neural network that captures temporal-spectral-spatial features from EEG of imagined speech, overtspeech, and speech perception. Furthermore, we demonstratedthe underlying neural features of individual subjects while per-forming imagined speech by comparing the functional connec-tivity and the EEG envelope features. Our results demonstratethe possibility of subject identification from single channel EEGof imagined speech and overt speech. Also, the comparison ofthe three speech-related paradigms will provide valuable infor-mation for the practical use of speech-related brain signals inthe further studies.

HCDec 7, 2020
Functional Connectivity of Imagined Speech and Visual Imagery based on Spectral Dynamics

Seo-Hyun Lee, Minji Lee, Seong-Whan Lee

Recent advances in brain-computer interface technology have shown the potential of imagined speech and visual imagery as a robust paradigm for intuitive brain-computer interface communication. However, the internal dynamics of the two paradigms along with their intrinsic features haven't been revealed. In this paper, we investigated the functional connectivity of the two paradigms, considering various frequency ranges. The dataset of sixteen subjects performing thirteen-class imagined speech and visual imagery were used for the analysis. The phase-locking value of imagined speech and visual imagery was analyzed in seven cortical regions with four frequency ranges. We compared the functional connectivity of imagined speech and visual imagery with the resting state to investigate the brain alterations during the imagery. The phase-locking value in the whole brain region exhibited a significant decrease during both imagined speech and visual imagery. Broca and Wernicke's area along with the auditory cortex mainly exhibited a significant decrease in the imagined speech, and the prefrontal cortex and the auditory cortex have shown a significant decrease in the visual imagery paradigm. Further investigation on the brain connectivity along with the decoding performance of the two paradigms may play a crucial role as a performance predictor.

HCFeb 4, 2020
Spatio-Temporal Dynamics of Visual Imagery for Intuitive Brain-Computer Interface

Seo-Hyun Lee, Minji Lee, Seong-Whan Lee

Visual imagery is an intuitive brain-computer interface paradigm, referring to the emergence of the visual scene. Despite its convenience, analysis of its intrinsic characteristics is limited. In this study, we demonstrate the effect of time interval and channel selection that affects the decoding performance of the multi-class visual imagery. We divided the epoch into time intervals of 0-1 s and 1-2 s and performed six-class classification in three different brain regions: whole brain, visual cortex, and prefrontal cortex. In the time interval, 0-1 s group showed 24.2 % of average classification accuracy, which was significantly higher than the 1-2 s group in the prefrontal cortex. In the three different regions, the classification accuracy of the prefrontal cortex showed significantly higher performance than the visual cortex in 0-1 s interval group, implying the cognitive arousal during the visual imagery. This finding would provide crucial information in improving the decoding performance.