SDJun 8, 2021

Neural Speaker Embeddings for Ultrasound-based Silent Speech Interfaces

arXiv:2106.04552v26 citations
Originality Incremental advance
AI Analysis

This work addresses the challenge of multi-speaker modeling for silent speech interfaces, which could benefit individuals with speech impairments, though it is incremental as it adapts an existing method to a new domain.

The paper tackled the problem of speaker-dependent limitations in ultrasound-based silent speech interfaces by adapting the x-vector framework to model speaker characteristics from ultrasound tongue videos, achieving speaker recognition error rates below 3% and showing that embedding vectors generalize to unseen speakers, with a marginal reduction in spectral estimation error in multi-speaker scenarios.

Articulatory-to-acoustic mapping seeks to reconstruct speech from a recording of the articulatory movements, for example, an ultrasound video. Just like speech signals, these recordings represent not only the linguistic content, but are also highly specific to the actual speaker. Hence, due to the lack of multi-speaker data sets, researchers have so far concentrated on speaker-dependent modeling. Here, we present multi-speaker experiments using the recently published TaL80 corpus. To model speaker characteristics, we adjusted the x-vector framework popular in speech processing to operate with ultrasound tongue videos. Next, we performed speaker recognition experiments using 50 speakers from the corpus. Then, we created speaker embedding vectors and evaluated them on the remaining speakers. Finally, we examined how the embedding vector influences the accuracy of our ultrasound-to-speech conversion network in a multi-speaker scenario. In the experiments we attained speaker recognition error rates below 3%, and we also found that the embedding vectors generalize nicely to unseen speakers. Our first attempt to apply them in a multi-speaker silent speech framework brought about a marginal reduction in the error rate of the spectral estimation step.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes