CL LG ASNov 20, 2019

On using 2D sequence-to-sequence models for speech recognition

Parnia Bahar, Albert Zeyer, Ralf Schlüter, Hermann Ney

arXiv:1911.08888v11.38 citations

Originality Incremental advance

AI Analysis

This work addresses automatic speech recognition for incremental improvements in model design.

The paper tackled speech recognition by proposing a 2DLSTM architecture to model input-output relations without attention, achieving competitive word error rates on the Switchboard 300h task.

Attention-based sequence-to-sequence models have shown promising results in automatic speech recognition. Using these architectures, one-dimensional input and output sequences are related by an attention approach, thereby replacing more explicit alignment processes, like in classical HMM-based modeling. In contrast, here we apply a novel two-dimensional long short-term memory (2DLSTM) architecture to directly model the input/output relation between audio/feature vector sequences and word sequences. The proposed model is an alternative model such that instead of using any type of attention components, we apply a 2DLSTM layer to assimilate the context from both input observations and output transcriptions. The experimental evaluation on the Switchboard 300h automatic speech recognition task shows word error rates for the 2DLSTM model that are competitive to end-to-end attention-based model.

View on arXiv PDF

Similar