CLSDAug 23, 2025

Geolocation-Aware Robust Spoken Language Identification

arXiv:2508.17148v11 citationsh-index: 4
Originality Incremental advance
AI Analysis

This addresses robustness issues in spoken language identification for applications dealing with dialectal and accented variations, representing a strong incremental improvement over existing SSL methods.

The paper tackles the problem of spoken language identification models struggling to classify dialects and accents of the same language as a unified class by proposing a geolocation-aware approach that incorporates language-level geolocation information into SSL-based models. The method achieves new state-of-the-art accuracy of 97.7% on FLEURS and a 9.7% relative improvement on the ML-SUPERB 2.0 dialect set.

While Self-supervised Learning (SSL) has significantly improved Spoken Language Identification (LID), existing models often struggle to consistently classify dialects and accents of the same language as a unified class. To address this challenge, we propose geolocation-aware LID, a novel approach that incorporates language-level geolocation information into the SSL-based LID model. Specifically, we introduce geolocation prediction as an auxiliary task and inject the predicted vectors into intermediate representations as conditioning signals. This explicit conditioning encourages the model to learn more unified representations for dialectal and accented variations. Experiments across six multilingual datasets demonstrate that our approach improves robustness to intra-language variations and unseen domains, achieving new state-of-the-art accuracy on FLEURS (97.7%) and 9.7% relative improvement on ML-SUPERB 2.0 dialect set.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes