AS CLJun 23, 2023

Implementing contextual biasing in GPU decoder for online ASR

Iuliia Nigmatulina, Srikanth Madikeri, Esaú Villatoro-Tello, Petr Motliček, Juan Zuluaga-Gomez, Karthik Pandia, Aravind Ganapathiraju

arXiv:2306.15685v14.36 citationsh-index: 31Has Code

Originality Incremental advance

AI Analysis

This work addresses the challenge of improving ASR predictions with contextual information in real-time GPU decoding, which is incremental as it builds on existing GPU decoder frameworks.

The paper tackled the problem of integrating contextual biasing into real-time GPU decoding for online automatic speech recognition (ASR), proposing an approach that enables dynamic context switching and flexible rescoring per speech segment directly on GPU, with the code publicly released and tested on open-sourced datasets.

GPU decoding significantly accelerates the output of ASR predictions. While GPUs are already being used for online ASR decoding, post-processing and rescoring on GPUs have not been properly investigated yet. Rescoring with available contextual information can considerably improve ASR predictions. Previous studies have proven the viability of lattice rescoring in decoding and biasing language model (LM) weights in offline and online CPU scenarios. In real-time GPU decoding, partial recognition hypotheses are produced without lattice generation, which makes the implementation of biasing more complex. The paper proposes and describes an approach to integrate contextual biasing in real-time GPU decoding while exploiting the standard Kaldi GPU decoder. Besides the biasing of partial ASR predictions, our approach also permits dynamic context switching allowing a flexible rescoring per each speech segment directly on GPU. The code is publicly released and tested with open-sourced test sets.

View on arXiv PDF Code

Similar