Predicting Different Acoustic Features from EEG and towards direct synthesis of Audio Waveform from EEG
This work addresses brain-computer interfaces for speech synthesis, but it appears incremental as it builds on prior methods for predicting acoustic features from EEG.
The paper tackles the problem of synthesizing audio from EEG signals by introducing a deep learning model that directly generates audio waveforms from raw EEG data, and also predicts 16 acoustic features from EEG features, with results showing relationships between EEG signals and acoustic features during speech perception and production.
In [1,2] authors provided preliminary results for synthesizing speech from electroencephalography (EEG) features where they first predict acoustic features from EEG features and then the speech is reconstructed from the predicted acoustic features using griffin lim reconstruction algorithm. In this paper we first introduce a deep learning model that takes raw EEG waveform signals as input and directly produces audio waveform as output. We then demonstrate predicting 16 different acoustic features from EEG features. We demonstrate our results for both spoken and listen condition in this paper. The results presented in this paper shows how different acoustic features are related to non-invasive neural EEG signals recorded during speech perception and production.