Pkwrap: a PyTorch Package for LF-MMI Training of Acoustic Models
This work provides a tool for speech recognition researchers and engineers to combine PyTorch's flexibility with Kaldi's proven training methods, but it is incremental as it builds on existing frameworks without introducing new algorithms.
The authors tackled the challenge of integrating Kaldi's LF-MMI training framework with PyTorch for acoustic model development, resulting in a wrapper package that enables flexible model design and includes features like parallel training and decoding.
We present a simple wrapper that is useful to train acoustic models in PyTorch using Kaldi's LF-MMI training framework. The wrapper, called pkwrap (short form of PyTorch kaldi wrapper), enables the user to utilize the flexibility provided by PyTorch in designing model architectures. It exposes the LF-MMI cost function as an autograd function. Other capabilities of Kaldi have also been ported to PyTorch. This includes the parallel training ability when multi-GPU environments are unavailable and decode with graphs created in Kaldi. The package is available on Github at https://github.com/idiap/pkwrap.