ASAICLLGSPJan 31, 2025

Language Bias in Self-Supervised Learning For Automatic Speech Recognition

arXiv:2501.19321v1h-index: 1
Originality Incremental advance
AI Analysis

This addresses bias issues in ASR for multilingual applications, but it is incremental as it builds on known SSL biases in other domains.

The paper investigates language bias in multilingual self-supervised learning for automatic speech recognition, finding that models like XLS-R rely heavily on weights from languages with the most training data, bypassing traditional linguistic knowledge during fine-tuning.

Self-supervised learning (SSL) is used in deep learning to train on large datasets without the need for expensive labelling of the data. Recently, large Automatic Speech Recognition (ASR) models such as XLS-R have utilised SSL to train on over one hundred different languages simultaneously. However, deeper investigation shows that the bulk of the training data for XLS-R comes from a small number of languages. Biases learned through SSL have been shown to exist in multiple domains, but language bias in multilingual SSL ASR has not been thoroughly examined. In this paper, we utilise the Lottery Ticket Hypothesis (LTH) to identify language-specific subnetworks within XLS-R and test the performance of these subnetworks on a variety of different languages. We are able to show that when fine-tuning, XLS-R bypasses traditional linguistic knowledge and builds only on weights learned from the languages with the largest data contribution to the pretraining data.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes