SD CL ASOct 28, 2025

emg2speech: synthesizing speech from electromyography using self-supervised speech models

arXiv:2510.23969v12 citationsh-index: 2

Originality Incremental advance

AI Analysis

This provides a neuromuscular speech interface for applications like assistive communication, though it is incremental as it builds on existing self-supervised models.

The paper tackles the problem of synthesizing speech from electromyography (EMG) signals by leveraging self-supervised speech models, achieving a linear mapping correlation of r = 0.85 and enabling end-to-end EMG-to-speech generation without explicit articulatory models.

We present a neuromuscular speech interface that translates electromyographic (EMG) signals collected from orofacial muscles during speech articulation directly into audio. We show that self-supervised speech (SS) representations exhibit a strong linear relationship with the electrical power of muscle action potentials: SS features can be linearly mapped to EMG power with a correlation of $r = 0.85$. Moreover, EMG power vectors corresponding to different articulatory gestures form structured and separable clusters in feature space. This relationship: $\text{SS features}$ $\xrightarrow{\texttt{linear mapping}}$ $\text{EMG power}$ $\xrightarrow{\texttt{gesture-specific clustering}}$ $\text{articulatory movements}$, highlights that SS models implicitly encode articulatory mechanisms. Leveraging this property, we directly map EMG signals to SS feature space and synthesize speech, enabling end-to-end EMG-to-speech generation without explicit articulatory models and vocoder training.

View on arXiv PDF

Similar