CLSDMay 20

Evaluating Speech Articulation Synthesis with Articulatory Phoneme Recognition

arXiv:2605.2092062.5
Predicted impact top 96% in CL · last 90 daysOriginality Incremental advance
AI Analysis

This work provides a new evaluation method for articulatory speech synthesis, which is a niche domain, but the results are preliminary and the improvement over existing metrics is not quantified.

The authors propose using phoneme recognition with articulatory features as a proxy to evaluate speech articulation synthesis, addressing the lack of objective metrics. Their neural network trained on acoustic and articulatory features shows that articulatory features capture phonetic nuances better than traditional metrics.

Recent advances in machine learning and the availability of articulatory datasets allow vocal tract synthesis to be conditioned on phonetic sequences, a primary task of articulatory speech synthesis. However, quality assessment needs a better definition. Generally, ranking generative models is tricky due to subjectivity. However, articulatory synthesis has the additional difficulty of requiring specialized knowledge in vocal tract anatomy and acoustics. To address this problem, this paper proposes to evaluate speech articulation synthesis using phoneme recognition as a proxy. Our hypothesis is that phoneme recognition using articulatory features better captures nuances in phoneme production, such as correct places of articulation, which traditional metrics (e.g., point-wise distance metrics) do not. We train a neural network with acoustic and articulatory features extracted from a single-speaker RT-MRI dataset. Then, we compare the recognition performance when testing the model with different synthetic articulatory features. Our results show that our articulatory feature set is phonetically rich and helps exploring additional dimensions on speech articulation synthesis.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes