ASSDApr 13, 2018

Language Recognition using Time Delay Deep Neural Network

arXiv:1804.05000v16 citations
Originality Synthesis-oriented
AI Analysis

This work addresses language recognition for speech processing applications, but it is incremental as it builds on existing I-vector and DNN methods.

The paper tackled language recognition by using a Time Delay Deep Neural Network as a universal background model in an I-vector framework, achieving results tested on fourteen languages with the ability to easily add new languages by retraining only a logistic regression model.

This work explores the use of a monolingual Deep Neural Network (DNN) model as an universal background model (UBM) to address the problem of Language Recognition (LR) in I-vector framework. A Time Delay Deep Neural Network (TDDNN) architecture is used in this work, which is trained as an acoustic model in an English Automatic Speech Recognition (ASR) task. A logistic regression model is trained to classify the I-vectors. The proposed system is tested with fourteen languages with various confusion pairs and it can be easily extended to include a new language by just retraining the last simple logistic regression model. The architectural flexibility is the major advantage of the proposed system compared to the single DNN classifier based approach.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes