CLCVSDSep 18, 2018

Language Identification with Deep Bottleneck Features

arXiv:1809.08909v25 citations
Originality Synthesis-oriented
AI Analysis

This work addresses language identification for short utterances in intelligent vehicles, but it is incremental as it builds on existing methods like LSTM and transfer learning.

The paper tackles language identification for short speech utterances in intelligent vehicles by using LSTM networks with bottleneck features from a DNN and time-scale modification to extend utterance length, achieving improved performance on 1s and 3s durations as shown on the AP17-OLR database.

In this paper we proposed an end-to-end short utterances speech language identification(SLD) approach based on a Long Short Term Memory (LSTM) neural network which is special suitable for SLD application in intelligent vehicles. Features used for LSTM learning are generated by a transfer learning method. Bottle-neck features of a deep neural network (DNN) which are trained for mandarin acoustic-phonetic classification are used for LSTM training. In order to improve the SLD accuracy of short utterances a phase vocoder based time-scale modification(TSM) method is used to reduce and increase speech rated of the test utterance. By splicing the normal, speech rate reduced and increased utterances, we can extend length of test utterances so as to improved improved the performance of the SLD system. The experimental results on AP17-OLR database shows that the proposed methods can improve the performance of SLD, especially on short utterance with 1s and 3s durations.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes