ASAISPApr 1, 2022

Deep Neural Convolutive Matrix Factorization for Articulatory Representation Decomposition

arXiv:2204.00465v323 citationsh-index: 66
Originality Incremental advance
AI Analysis

This work addresses the need for interpretable speech representations in articulatory phonology, though it is incremental in bridging deep neural networks with existing phonological theories.

The authors tackled the problem of learning interpretable speech representations from articulatory kinematics by using a neural convolutive sparse matrix factorization method, decomposing data into gestures and gestural scores, and demonstrated its effectiveness through phoneme recognition experiments.

Most of the research on data-driven speech representation learning has focused on raw audios in an end-to-end manner, paying little attention to their internal phonological or gestural structure. This work, investigating the speech representations derived from articulatory kinematics signals, uses a neural implementation of convolutive sparse matrix factorization to decompose the articulatory data into interpretable gestures and gestural scores. By applying sparse constraints, the gestural scores leverage the discrete combinatorial properties of phonological gestures. Phoneme recognition experiments were additionally performed to show that gestural scores indeed code phonological information successfully. The proposed work thus makes a bridge between articulatory phonology and deep neural networks to leverage informative, intelligible, interpretable,and efficient speech representations.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes