SDAICLASOct 15, 2022

Learning Invariant Representation and Risk Minimized for Unsupervised Accent Domain Adaptation

arXiv:2210.08182v21 citationsh-index: 22
AI Analysis

This work addresses the challenge of adapting speech recognition models to accented domains without labeled data, which is incremental as it builds on existing unsupervised learning paradigms.

The paper tackled the problem of unsupervised representation learning for speech recognition by proposing a method to learn domain-invariant representations via direct mapping to linguistic information, resulting in improved adaptation ability and outperforming baselines on accented benchmarks.

Unsupervised representation learning for speech audios attained impressive performances for speech recognition tasks, particularly when annotated speech is limited. However, the unsupervised paradigm needs to be carefully designed and little is known about what properties these representations acquire. There is no guarantee that the model learns meaningful representations for valuable information for recognition. Moreover, the adaptation ability of the learned representations to other domains still needs to be estimated. In this work, we explore learning domain-invariant representations via a direct mapping of speech representations to their corresponding high-level linguistic informations. Results prove that the learned latents not only capture the articulatory feature of each phoneme but also enhance the adaptation ability, outperforming the baseline largely on accented benchmarks.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes