GPU-Accelerated Forward-Backward algorithm with Application to Lattice-Free MMI
This work provides a faster training method for speech recognition systems using LF-MMI, though it is incremental as it builds on existing GPU acceleration techniques.
The authors tackled the problem of slow training times for lattice-free MMI (LF-MMI) acoustic models by reformulating the forward-backward algorithm using sparse matrix operations in a semiring, enabling GPU acceleration. Their implementation achieved a training speed about two times faster than the existing PyChain system without approximations.
We propose to express the forward-backward algorithm in terms of operations between sparse matrices in a specific semiring. This new perspective naturally leads to a GPU-friendly algorithm which is easy to implement in Julia or any programming languages with native support of semiring algebra. We use this new implementation to train a TDNN with the LF-MMI objective function and we compare the training time of our system with PyChain - a recently introduced C++/CUDA implementation of the LF-MMI loss. Our implementation is about two times faster while not having to use any approximation such as the "leaky-HMM".