CLHCJun 19, 2018

A Survey of Recent DNN Architectures on the TIMIT Phone Recognition Task

arXiv:1806.07974v116 citationsHas Code
Originality Synthesis-oriented
AI Analysis

This work addresses the need for reliable baselines in speech recognition research, particularly for low-resource scenarios, though it is incremental as it focuses on benchmarking and replication.

The paper tackled the problem of inconsistent and high phone error rates (PERs) reported for deep neural network (DNN) architectures on the TIMIT phone recognition task by evaluating recent DNNs and providing baseline models with open-source scripts, achieving a PER better than the best-published result to date.

In this survey paper, we have evaluated several recent deep neural network (DNN) architectures on a TIMIT phone recognition task. We chose the TIMIT corpus due to its popularity and broad availability in the community. It also simulates a low-resource scenario that is helpful in minor languages. Also, we prefer the phone recognition task because it is much more sensitive to an acoustic model quality than a large vocabulary continuous speech recognition (LVCSR) task. In recent years, many DNN published papers reported results on TIMIT. However, the reported phone error rates (PERs) were often much higher than a PER of a simple feed-forward (FF) DNN. That was the main motivation of this paper: To provide a baseline DNNs with open-source scripts to easily replicate the baseline results for future papers with lowest possible PERs. According to our knowledge, the best-achieved PER of this survey is better than the best-published PER to date.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes