ASLGSDSPFeb 7, 2021

EMA2S: An End-to-End Multimodal Articulatory-to-Speech System

arXiv:2102.03786v216 citations
AI Analysis

This system addresses the problem of generating speech from articulatory movements for patients with vocal cord disorders, silent speech applications, and high-noise environments.

The paper introduces EMA2S, an end-to-end multimodal articulatory-to-speech system that converts articulatory movements into speech signals. The system, which uses a neural-network-based vocoder and multimodal joint-training, outperforms a baseline system in both objective and subjective evaluations.

Synthesized speech from articulatory movements can have real-world use for patients with vocal cord disorders, situations requiring silent speech, or in high-noise environments. In this work, we present EMA2S, an end-to-end multimodal articulatory-to-speech system that directly converts articulatory movements to speech signals. We use a neural-network-based vocoder combined with multimodal joint-training, incorporating spectrogram, mel-spectrogram, and deep features. The experimental results confirm that the multimodal approach of EMA2S outperforms the baseline system in terms of both objective evaluation and subjective evaluation metrics. Moreover, results demonstrate that joint mel-spectrogram and deep feature loss training can effectively improve system performance.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes