SDAILGMMJun 2, 2025

LASPA: Language Agnostic Speaker Disentanglement with Prefix-Tuned Cross-Attention

arXiv:2506.02083v1h-index: 4INTERSPEECH
Originality Incremental advance
AI Analysis

This addresses speaker recognition challenges for systems operating in diverse linguistic environments, representing an incremental advancement.

The paper tackles the problem of speaker recognition in multi-lingual settings by disentangling linguistic information from speaker embeddings, resulting in improved equal error rates across multiple datasets.

Speaker recognition models face challenges in multi-lingual settings due to the entanglement of linguistic information within speaker embeddings. The overlap between vocal traits such as accent, vocal anatomy, and a language's phonetic structure complicates separating linguistic and speaker information. Disentangling these components can significantly improve speaker recognition accuracy. To this end, we propose a novel disentanglement learning strategy that integrates joint learning through prefix-tuned cross-attention. This approach is particularly effective when speakers switch between languages. Experimental results show the model generalizes across monolingual and multi-lingual settings, including unseen languages. Notably, the proposed model improves the equal error rate across multiple datasets, highlighting its ability to separate language information from speaker embeddings and enhance recognition in diverse linguistic conditions.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes