Comparison of sEMG Encoding Accuracy Across Speech Modes Using Articulatory and Phoneme Features
Provides a robust and interpretable intermediate representation for sEMG-based silent-speech interfaces, benefiting speech prosthesis and human-computer interaction.
SPARC features predict sEMG envelopes more accurately than phoneme one-hot representations across aloud, mimed, and subvocal speech in 24 subjects, with subvocal speech remaining above chance. Aloud and mimed speech perform comparably.
We test whether Speech Articulatory Coding (SPARC) features can linearly predict surface electromyography (sEMG) envelopes across aloud, mimed, and subvocal speech in twenty-four subjects. Using elastic-net multivariate temporal response function (mTRF) with sentence-level cross-validation, SPARC yields higher prediction accuracy than phoneme one-hot representations on nearly all electrodes and in all speech modes. Aloud and mimed speech perform comparably, and subvocal speech remains above chance, indicating detectable articulatory activity. Variance partitioning shows a substantial unique contribution from SPARC and a minimal unique contribution from phoneme features. mTRF weight patterns reveal anatomically interpretable relationships between electrode sites and articulatory movements that remain consistent across modes. This study focuses on representation/encoding analysis (not end-to-end decoding) and supports SPARC as a robust and interpretable intermediate target for sEMG-based silent-speech modeling.