Densely Connected Convolutional Networks for Speech Recognition
This work addresses speech recognition accuracy for applications like transcription, but it is incremental as it adapts an existing computer vision method to a new domain.
The paper tackled acoustic modeling for speech recognition by applying Densely Connected Convolutional Networks (DenseNets), achieving significant performance improvements over other neural models, including outperforming them with only half the training data on the Wall Street Journal dataset.
This paper presents our latest investigation on Densely Connected Convolutional Networks (DenseNets) for acoustic modelling (AM) in automatic speech recognition. DenseN-ets are very deep, compact convolutional neural networks, which have demonstrated incredible improvements over the state-of-the-art results on several data sets in computer vision. Our experimental results show that DenseNet can be used for AM significantly outperforming other neural-based models such as DNNs, CNNs, VGGs. Furthermore, results on Wall Street Journal revealed that with only a half of the training data DenseNet was able to outperform other models trained with the full data set by a large margin.