CL SD ASMay 29, 2025

Automatic classification of stop realisation with wav2vec2.0

James Tanner, Morgan Sonderegger, Jane Stuart-Smith, Jeff Mielke, Tyler Kendall

arXiv:2505.23688v22.7h-index: 28Has CodeINTERSPEECH

Originality Synthesis-oriented

AI Analysis

This provides a tool for phonetic researchers to scale up annotation tasks, though it is incremental as it applies an existing method to a new domain.

The researchers tackled the problem of automatically annotating variable phonetic phenomena in speech data by training wav2vec2.0 models to classify stop burst presence, achieving high accuracy in English and Japanese across different speech corpora.

Modern phonetic research regularly makes use of automatic tools for the annotation of speech data, however few tools exist for the annotation of many variable phonetic phenomena. At the same time, pre-trained self-supervised models, such as wav2vec2.0, have been shown to perform well at speech classification tasks and latently encode fine-grained phonetic information. We demonstrate that wav2vec2.0 models can be trained to automatically classify stop burst presence with high accuracy in both English and Japanese, robust across both finely-curated and unprepared speech corpora. Patterns of variability in stop realisation are replicated with the automatic annotations, and closely follow those of manual annotations. These results demonstrate the potential of pre-trained speech models as tools for the automatic annotation and processing of speech corpus data, enabling researchers to 'scale-up' the scope of phonetic research with relative ease.

View on arXiv PDF Code

Similar