Convolutional Attention-based Seq2Seq Neural Network for End-to-End ASR
This work addresses speech recognition for applications requiring accurate transcription, but it appears incremental as it combines existing techniques like attention and convolutional networks.
The paper tackled end-to-end automatic speech recognition by proposing a convolutional attention-based sequence-to-sequence neural network, achieving a 15.8% phoneme error rate on the TIMIT dataset.
This thesis introduces the sequence to sequence model with Luong's attention mechanism for end-to-end ASR. It also describes various neural network algorithms including Batch normalization, Dropout and Residual network which constitute the convolutional attention-based seq2seq neural network. Finally the proposed model proved its effectiveness for speech recognition achieving 15.8% phoneme error rate on TIMIT dataset.