CL LG SD ASJun 7, 2023

Label Aware Speech Representation Learning For Language Identification

Shikhar Vashishth, Shikhar Bharadwaj, Sriram Ganapathy, Ankur Bapna, Min Ma, Wei Han, Vera Axelrod, Partha Talukdar

CMUDeepMind

arXiv:2306.04374v12.14 citationsh-index: 45

Originality Incremental advance

AI Analysis

This work addresses language recognition, a key task in speech processing, by enhancing representation learning with label information, though it appears incremental as it builds on existing self-supervised methods.

The paper tackles language identification by proposing a novel framework that combines self-supervised representation learning with language label information, resulting in improved performance over state-of-the-art systems on public datasets like FLEURS and Dhwani.

Speech representation learning approaches for non-semantic tasks such as language recognition have either explored supervised embedding extraction methods using a classifier model or self-supervised representation learning approaches using raw data. In this paper, we propose a novel framework of combining self-supervised representation learning with the language label information for the pre-training task. This framework, termed as Label Aware Speech Representation (LASR) learning, uses a triplet based objective function to incorporate language labels along with the self-supervised loss function. The speech representations are further fine-tuned for the downstream task. The language recognition experiments are performed on two public datasets - FLEURS and Dhwani. In these experiments, we illustrate that the proposed LASR framework improves over the state-of-the-art systems on language identification. We also report an analysis of the robustness of LASR approach to noisy/missing labels as well as its application to multi-lingual speech recognition tasks.

View on arXiv PDF

Similar