Prosodic ABX: A Language-Agnostic Method for Measuring Prosodic Contrast in Speech Representations
This work addresses the need for evaluating prosodic sensitivity in speech models, which is incremental as it extends an existing phonemic framework to prosody.
The paper tackled the problem of measuring prosodic contrast in self-supervised speech model representations, which had not been directly assessed before, by introducing a language-agnostic method called prosodic ABX that requires minimal examples and no labels, and showed that model and layer rankings are consistent across conditions, making it practical for low-resource settings.
Speech representations from self-supervised speech models (S3Ms) are known to be sensitive to phonemic contrasts, but their sensitivity to prosodic contrasts has not been directly measured. The ABX discrimination task has been used to measure phonemic contrast in S3M representations via minimal pairs. We introduce prosodic ABX, an extension of this framework to evaluate prosodic contrast with only a handful of examples and no explicit labels. Also, we build and release a dataset of English and Japanese minimal pairs and use it along with a Mandarin dataset to evaluate contrast in English stress, Japanese pitch accent, and Mandarin tone. Finally, we show that model and layer rankings are often preserved across several experimental conditions, making it practical for low-resource settings.