LGASMLOct 21, 2019

Signal Combination for Language Identification

arXiv:1910.09687v213 citations
Originality Incremental advance
AI Analysis

This work addresses language identification accuracy for multilingual speech recognition systems, presenting an incremental improvement over existing methods.

The paper tackled the problem of language identification in multilingual speech recognition by combining low-level acoustic signals with language-specific recognizer signals, resulting in a deep neural network model that reduced the error rate from 5.5% to 4.3%, a 21.8% relative reduction.

Google's multilingual speech recognition system combines low-level acoustic signals with language-specific recognizer signals to better predict the language of an utterance. This paper presents our experience with different signal combination methods to improve overall language identification accuracy. We compare the performance of a lattice-based ensemble model and a deep neural network model to combine signals from recognizers with that of a baseline that only uses low-level acoustic signals. Experimental results show that the deep neural network model outperforms the lattice-based ensemble model, and it reduced the error rate from 5.5% in the baseline to 4.3%, which is a 21.8% relative reduction.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes