Tom Francart

AS
h-index34
7papers
187citations
Novelty52%
AI Score36

7 Papers

ASJul 1, 2022
Learning Subject-Invariant Representations from Speech-Evoked EEG Using Variational Autoencoders

Lies Bollens, Tom Francart, Hugo Van Hamme

The electroencephalogram (EEG) is a powerful method to understand how the brain processes speech. Linear models have recently been replaced for this purpose with deep neural networks and yield promising results. In related EEG classification fields, it is shown that explicitly modeling subject-invariant features improves generalization of models across subjects and benefits classification accuracy. In this work, we adapt factorized hierarchical variational autoencoders to exploit parallel EEG recordings of the same stimuli. We model EEG into two disentangled latent spaces. Subject accuracy reaches 98.96% and 1.60% on respectively the subject and content latent space, whereas binary content classification experiments reach an accuracy of 51.51% and 62.91% on respectively the subject and content latent space.

LGOct 17, 2023
Minimally Informed Linear Discriminant Analysis: training an LDA model with unlabelled data

Nicolas Heintz, Tom Francart, Alexander Bertrand

Linear Discriminant Analysis (LDA) is one of the oldest and most popular linear methods for supervised classification problems. In this paper, we demonstrate that it is possible to compute the exact projection vector from LDA models based on unlabelled data, if some minimal prior information is available. More precisely, we show that only one of the following three pieces of information is actually sufficient to compute the LDA projection vector if only unlabelled data are available: (1) the class average of one of the two classes, (2) the difference between both class averages (up to a scaling), or (3) the class covariance matrices (up to a scaling). These theoretical results are validated in numerical experiments, demonstrating that this minimally informed Linear Discriminant Analysis (MILDA) model closely matches the performance of a supervised LDA model. Furthermore, we show that the MILDA projection vector can be computed in a closed form with a computational cost comparable to LDA and is able to quickly adapt to non-stationary data, making it well-suited to use as an adaptive classifier.

SPJun 30, 2025
Post-processing of EEG-based Auditory Attention Decoding Decisions via Hidden Markov Models

Nicolas Heintz, Tom Francart, Alexander Bertrand

Auditory attention decoding (AAD) algorithms exploit brain signals, such as electroencephalography (EEG), to identify which speaker a listener is focusing on in a multi-speaker environment. While state-of-the-art AAD algorithms can identify the attended speaker on short time windows, their predictions are often too inaccurate for practical use. In this work, we propose augmenting AAD with a hidden Markov model (HMM) that models the temporal structure of attention. More specifically, the HMM relies on the fact that a subject is much less likely to switch attention than to keep attending the same speaker at any moment in time. We show how a HMM can significantly improve existing AAD algorithms in both causal (real-time) and non-causal (offline) settings. We further demonstrate that HMMs outperform existing postprocessing approaches in both accuracy and responsiveness, and explore how various factors such as window length, switching frequency, and AAD accuracy influence overall performance. The proposed method is computationally efficient, intuitive to use and applicable in both real-time and offline settings.

ASJun 17, 2021
Extracting Different Levels of Speech Information from EEG Using an LSTM-Based Model

Mohammad Jalilpour Monesi, Bernd Accou, Tom Francart et al.

Decoding the speech signal that a person is listening to from the human brain via electroencephalography (EEG) can help us understand how our auditory system works. Linear models have been used to reconstruct the EEG from speech or vice versa. Recently, Artificial Neural Networks (ANNs) such as Convolutional Neural Network (CNN) and Long Short-Term Memory (LSTM) based architectures have outperformed linear models in modeling the relation between EEG and speech. Before attempting to use these models in real-world applications such as hearing tests or (second) language comprehension assessment we need to know what level of speech information is being utilized by these models. In this study, we aim to analyze the performance of an LSTM-based model using different levels of speech features. The task of the model is to determine which of two given speech segments is matched with the recorded EEG. We used low- and high-level speech features including: envelope, mel spectrogram, voice activity, phoneme identity, and word embedding. Our results suggest that the model exploits information about silences, intensity, and broad phonetic classes from the EEG. Furthermore, the mel spectrogram, which contains all this information, yields the highest accuracy (84%) among all the features.

ASMay 14, 2021
Predicting speech intelligibility from EEG in a non-linear classification paradigm

Bernd Accou, Mohammad Jalilpour Monesi, Hugo Van hamme et al.

Objective: Currently, only behavioral speech understanding tests are available, which require active participation of the person being tested. As this is infeasible for certain populations, an objective measure of speech intelligibility is required. Recently, brain imaging data has been used to establish a relationship between stimulus and brain response. Linear models have been successfully linked to speech intelligibility but require per-subject training. We present a deep-learning-based model incorporating dilated convolutions that operates in a match/mismatch paradigm. The accuracy of the model's match/mismatch predictions can be used as a proxy for speech intelligibility without subject-specific (re)training. Approach: We evaluated the performance of the model as a function of input segment length, EEG frequency band and receptive field size while comparing it to multiple baseline models. Next, we evaluated performance on held-out data and finetuning. Finally, we established a link between the accuracy of our model and the state-of-the-art behavioral MATRIX test. Main results: The dilated convolutional model significantly outperformed the baseline models for every input segment length, for all EEG frequency bands except the delta and theta band, and receptive field sizes between 250 and 500 ms. Additionally, finetuning significantly increased the accuracy on a held-out dataset. Finally, a significant correlation (r=0.59, p=0.0154) was found between the speech reception threshold estimated using the behavioral MATRIX test and our objective method. Significance: Our method is the first to predict the speech reception threshold from EEG for unseen subjects, contributing to objective measures of speech intelligibility.

ASOct 5, 2017
Head shadow enhancement with low-frequency beamforming improves sound localization and speech perception for simulated bimodal listeners

Benjamin Dieudonné, Tom Francart

Many hearing-impaired listeners struggle to localize sounds due to poor availability of binaural cues. Listeners with a cochlear implant and a contralateral hearing aid -- so-called bimodal listeners -- are amongst the worst performers, as both interaural time and level differences are poorly transmitted. We present a new method to enhance head shadow in the low frequencies. Head shadow enhancement is achieved with a fixed beamformer with contralateral attenuation in each ear. The method results in interaural level differences which vary monotonically with angle. It also improves low-frequency signal-to-noise ratios in conditions with spatially separated speech and noise. We validated the method in two experiments with acoustic simulations of bimodal listening. In the localization experiment, performance improved from 50.5° to 26.8° root-mean-square error compared with standard omni-directional microphones. In the speech-in-noise experiment, speech was presented from the frontal direction. Speech reception thresholds improved by 15.7 dB SNR when the noise was presented from the cochlear implant side, improved by 7.6 dB SNR when the noise was presented from the hearing aid side, and was not affected when noise was presented from all directions. Apart from bimodal listeners, the method might also be promising for bilateral cochlear implant or hearing aid users. Its low computational complexity makes the method suitable for application in current clinical devices. Keywords: head shadow enhancement, enhancement of interaural level differences, sound localization, directional hearing, speech in noise, speech intelligibility PACS: 43.60.Fg, 43.66.Pn, 43.66.Qp, 43.66.Rq, 43.66.Ts, 43.71.-k, 43.71.Es, 43.71.Ky

SDFeb 18, 2016
EEG-informed attended speaker extraction from recorded speech mixtures with application in neuro-steered hearing prostheses

Simon Van Eyndhoven, Tom Francart, Alexander Bertrand

OBJECTIVE: We aim to extract and denoise the attended speaker in a noisy, two-speaker acoustic scenario, relying on microphone array recordings from a binaural hearing aid, which are complemented with electroencephalography (EEG) recordings to infer the speaker of interest. METHODS: In this study, we propose a modular processing flow that first extracts the two speech envelopes from the microphone recordings, then selects the attended speech envelope based on the EEG, and finally uses this envelope to inform a multi-channel speech separation and denoising algorithm. RESULTS: Strong suppression of interfering (unattended) speech and background noise is achieved, while the attended speech is preserved. Furthermore, EEG-based auditory attention detection (AAD) is shown to be robust to the use of noisy speech signals. CONCLUSIONS: Our results show that AAD-based speaker extraction from microphone array recordings is feasible and robust, even in noisy acoustic environments, and without access to the clean speech signals to perform EEG-based AAD. SIGNIFICANCE: Current research on AAD always assumes the availability of the clean speech signals, which limits the applicability in real settings. We have extended this research to detect the attended speaker even when only microphone recordings with noisy speech mixtures are available. This is an enabling ingredient for new brain-computer interfaces and effective filtering schemes in neuro-steered hearing prostheses. Here, we provide a first proof of concept for EEG-informed attended speaker extraction and denoising.