ASCLSDJun 10, 2023

What Can an Accent Identifier Learn? Probing Phonetic and Prosodic Information in a Wav2vec2-based Accent Identification Model

arXiv:2306.06524v128 citationsh-index: 56
Originality Synthesis-oriented
AI Analysis

This provides insights into SSL feature interactions for speech processing researchers, but it is incremental as it builds on existing probing methods.

The study investigated how fine-tuning a self-supervised learning model for accent identification affects its encoding of phoneme and prosody information, finding that the top 2 layers learned richer representations and layer 9 showed strong accent-specific phoneme features.

This study is focused on understanding and quantifying the change in phoneme and prosody information encoded in the Self-Supervised Learning (SSL) model, brought by an accent identification (AID) fine-tuning task. This problem is addressed based on model probing. Specifically, we conduct a systematic layer-wise analysis of the representations of the Transformer layers on a phoneme correlation task, and a novel word-level prosody prediction task. We compare the probing performance of the pre-trained and fine-tuned SSL models. Results show that the AID fine-tuning task steers the top 2 layers to learn richer phoneme and prosody representation. These changes share some similarities with the effects of fine-tuning with an Automatic Speech Recognition task. In addition, we observe strong accent-specific phoneme representations in layer 9. To sum up, this study provides insights into the understanding of SSL features and their interactions with fine-tuning tasks.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes