ASIRLGSDNov 15, 2021

Attention based end to end Speech Recognition for Voice Search in Hindi and English

arXiv:2111.10208v19 citations
Originality Incremental advance
AI Analysis

This work addresses speech recognition for voice search in e-commerce, providing incremental improvements to existing models.

The paper tackled automatic speech recognition for voice search in Hindi and English on the Flipkart e-commerce platform by enhancing the Listen-Attend-Spell model with multi-objective training, multi-pass training, and external rescoring, achieving a 15.7% relative WER improvement over state-of-the-art LAS models and a 36.9% improvement over phoneme-CTC systems.

We describe here our work with automatic speech recognition (ASR) in the context of voice search functionality on the Flipkart e-Commerce platform. Starting with the deep learning architecture of Listen-Attend-Spell (LAS), we build upon and expand the model design and attention mechanisms to incorporate innovative approaches including multi-objective training, multi-pass training, and external rescoring using language models and phoneme based losses. We report a relative WER improvement of 15.7% on top of state-of-the-art LAS models using these modifications. Overall, we report an improvement of 36.9% over the phoneme-CTC system. The paper also provides an overview of different components that can be tuned in a LAS-based system.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes