ASCLSDMay 20, 2025

Pairwise Evaluation of Accent Similarity in Speech Synthesis

arXiv:2505.14410v16 citationsh-index: 5INTERSPEECH
Originality Incremental advance
AI Analysis

This work addresses the underexplored challenge of accent similarity evaluation in speech synthesis, which is important for researchers and developers in speech technology, though it appears incremental as it refines existing methods rather than introducing a new paradigm.

The paper tackled the problem of evaluating accent similarity in speech synthesis by enhancing both subjective and objective evaluation methods, resulting in a refined XAB listening test that achieves higher statistical significance with fewer listeners and lower costs, and identifying pronunciation-related metrics that can be used alongside common metrics while highlighting limitations of Word Error Rate for underrepresented accents.

Despite growing interest in generating high-fidelity accents, evaluating accent similarity in speech synthesis has been underexplored. We aim to enhance both subjective and objective evaluation methods for accent similarity. Subjectively, we refine the XAB listening test by adding components that achieve higher statistical significance with fewer listeners and lower costs. Our method involves providing listeners with transcriptions, having them highlight perceived accent differences, and implementing meticulous screening for reliability. Objectively, we utilise pronunciation-related metrics, based on distances between vowel formants and phonetic posteriorgrams, to evaluate accent generation. Comparative experiments reveal that these metrics, alongside accent similarity, speaker similarity, and Mel Cepstral Distortion, can be used. Moreover, our findings underscore significant limitations of common metrics like Word Error Rate in assessing underrepresented accents.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes