Auditory Attention Decoding from EEG using Convolutional Recurrent Neural Network
This work addresses the challenge of improving auditory attention decoding accuracy for shorter time windows, which is incremental as it builds on prior deep learning approaches by better utilizing spatial and temporal EEG features.
The paper tackled the problem of decoding auditory attention from EEG data in multi-talker scenarios, achieving around 90% accuracy for short decoding windows (2s and 5s) with a proposed CRNN-based classification model, which outperformed existing methods.
The auditory attention decoding (AAD) approach was proposed to determine the identity of the attended talker in a multi-talker scenario by analyzing electroencephalography (EEG) data. Although the linear model-based method has been widely used in AAD, the linear assumption was considered oversimplified and the decoding accuracy remained lower for shorter decoding windows. Recently, nonlinear models based on deep neural networks (DNN) have been proposed to solve this problem. However, these models did not fully utilize both the spatial and temporal features of EEG, and the interpretability of DNN models was rarely investigated. In this paper, we proposed novel convolutional recurrent neural network (CRNN) based regression model and classification model, and compared them with both the linear model and the state-of-the-art DNN models. Results showed that, our proposed CRNN-based classification model outperformed others for shorter decoding windows (around 90% for 2 s and 5 s). Although worse than classification models, the decoding accuracy of the proposed CRNN-based regression model was about 5% greater than other regression models. The interpretability of DNN models was also investigated by visualizing layers' weight.