Transformer-based language modeling and decoding for conversational speech recognition
This work addresses the challenge of efficient decoding for conversational speech recognition, but it appears incremental as it adapts existing transformer methods to a specific framework.
The authors tackled the problem of integrating transformer-based language models into conversational speech recognition by proposing an efficient lattice re-scoring method within a weighted finite-state transducer framework, which leverages the transformer's ability to capture long-range history and avoid sequential computation.
We propose a way to use a transformer-based language model in conversational speech recognition. Specifically, we focus on decoding efficiently in a weighted finite-state transducer framework. We showcase an approach to lattice re-scoring that allows for longer range history captured by a transfomer-based language model and takes advantage of a transformer's ability to avoid computing sequentially.