ASCLSDJun 13, 2024

Exploring Spoken Language Identification Strategies for Automatic Transcription of Multilingual Broadcast and Institutional Speech

arXiv:2406.09290v11 citations
Originality Incremental advance
AI Analysis

This addresses the problem of accurate transcription in real-world multilingual scenarios like broadcast and institutional speech, which is an incremental improvement over existing methods.

The paper tackles spoken language identification and speech recognition for multilingual broadcast and institutional speech by proposing a cascaded system using speaker diarization and language identification, which reduces language diarization error rates by up to 10% relative and word error rates by over 8% relative on multilingual test sets.

This paper addresses spoken language identification (SLI) and speech recognition of multilingual broadcast and institutional speech, real application scenarios that have been rarely addressed in the SLI literature. Observing that in these domains language changes are mostly associated with speaker changes, we propose a cascaded system consisting of speaker diarization and language identification and compare it with more traditional language identification and language diarization systems. Results show that the proposed system often achieves lower language classification and language diarization error rates (up to 10% relative language diarization error reduction and 60% relative language confusion reduction) and leads to lower WERs on multilingual test sets (more than 8% relative WER reduction), while at the same time does not negatively affect speech recognition on monolingual audio (with an absolute WER increase between 0.1% and 0.7% w.r.t. monolingual ASR).

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes