Kaldi+PDNN: Building DNN-based ASR Systems with Kaldi and PDNN
This work provides a practical solution for researchers and engineers in speech recognition to easily implement DNN-based ASR systems using existing toolkits, though it is incremental as it combines established methods.
The authors tackled the problem of building deep neural network (DNN)-based automated speech recognition (ASR) systems by developing open-source recipes that integrate the Kaldi toolkit with PDNN, a lightweight deep learning toolkit, enabling the construction of systems such as DNN hybrid, convolutional neural network, and bottleneck feature systems.
The Kaldi toolkit is becoming popular for constructing automated speech recognition (ASR) systems. Meanwhile, in recent years, deep neural networks (DNNs) have shown state-of-the-art performance on various ASR tasks. This document describes our open-source recipes to implement fully-fledged DNN acoustic modeling using Kaldi and PDNN. PDNN is a lightweight deep learning toolkit developed under the Theano environment. Using these recipes, we can build up multiple systems including DNN hybrid systems, convolutional neural network (CNN) systems and bottleneck feature systems. These recipes are directly based on the Kaldi Switchboard 110-hour setup. However, adapting them to new datasets is easy to achieve.