CLSDASFeb 23, 2023

ProsAudit, a prosodic benchmark for self-supervised speech models

Apple
arXiv:2302.12057v315 citationsh-index: 19
Originality Synthesis-oriented
AI Analysis

This work addresses the need for standardized evaluation of prosodic understanding in speech models, which is incremental as it builds on existing benchmarks.

The authors introduced ProsAudit, a benchmark to evaluate prosodic knowledge in self-supervised speech models, finding that models performed above chance on tasks like boundary identification and pause distinction, with performance improving with more data and varying by language.

We present ProsAudit, a benchmark in English to assess structural prosodic knowledge in self-supervised learning (SSL) speech models. It consists of two subtasks, their corresponding metrics, and an evaluation dataset. In the protosyntax task, the model must correctly identify strong versus weak prosodic boundaries. In the lexical task, the model needs to correctly distinguish between pauses inserted between words and within words. We also provide human evaluation scores on this benchmark. We evaluated a series of SSL models and found that they were all able to perform above chance on both tasks, even when evaluated on an unseen language. However, non-native models performed significantly worse than native ones on the lexical task, highlighting the importance of lexical knowledge in this task. We also found a clear effect of size with models trained on more data performing better in the two subtasks.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes