Synthesizing Speech from Intracranial Depth Electrodes using an Encoder-Decoder Framework
This work addresses enabling communication for people with dysarthria or anarthria by exploring a less invasive neural recording method, though it is incremental as it builds on prior electrocorticographic approaches.
The study tackled the problem of synthesizing speech from minimally invasive intracranial depth electrodes (sEEG) using a recurrent encoder-decoder model, achieving correlations up to 0.8 for audio reconstruction and outperforming an existing non-regressive convolutional neural network benchmark.
Speech Neuroprostheses have the potential to enable communication for people with dysarthria or anarthria. Recent advances have demonstrated high-quality text decoding and speech synthesis from electrocorticographic grids placed on the cortical surface. Here, we investigate a less invasive measurement modality in three participants, namely stereotactic EEG (sEEG) that provides sparse sampling from multiple brain regions, including subcortical regions. To evaluate whether sEEG can also be used to synthesize audio from neural recordings, we employ a recurrent encoder-decoder model based on modern deep learning methods. We find that speech can indeed be reconstructed with correlations up to 0.8 from these minimally invasive recordings, despite limited amounts of training data. In particular, the architecture we employ naturally picks up on the temporal nature of the data and thereby outperforms an existing benchmark based on non-regressive convolutional neural networks.