AS CL SDJun 10, 2023

What Can an Accent Identifier Learn? Probing Phonetic and Prosodic Information in a Wav2vec2-based Accent Identification Model

Mu Yang, Ram C. M. C. Shekar, Okim Kang, John H. L. Hansen

arXiv:2306.06524v111.728 citationsh-index: 56

Originality Synthesis-oriented

AI Analysis

This provides insights into SSL feature interactions for speech processing researchers, but it is incremental as it builds on existing probing methods.

The study investigated how fine-tuning a self-supervised learning model for accent identification affects its encoding of phoneme and prosody information, finding that the top 2 layers learned richer representations and layer 9 showed strong accent-specific phoneme features.

This study is focused on understanding and quantifying the change in phoneme and prosody information encoded in the Self-Supervised Learning (SSL) model, brought by an accent identification (AID) fine-tuning task. This problem is addressed based on model probing. Specifically, we conduct a systematic layer-wise analysis of the representations of the Transformer layers on a phoneme correlation task, and a novel word-level prosody prediction task. We compare the probing performance of the pre-trained and fine-tuned SSL models. Results show that the AID fine-tuning task steers the top 2 layers to learn richer phoneme and prosody representation. These changes share some similarities with the effects of fine-tuning with an Automatic Speech Recognition task. In addition, we observe strong accent-specific phoneme representations in layer 9. To sum up, this study provides insights into the understanding of SSL features and their interactions with fine-tuning tasks.

View on arXiv PDF

Similar