Modified SPLICE and its Extension to Non-Stereo Data for Noise Robust Speech Recognition
This work addresses noise robustness in speech recognition systems, particularly for unseen noise conditions, but is incremental as it builds on the existing SPLICE framework.
The paper tackles noise robust speech recognition by modifying the SPLICE algorithm to improve performance across noise conditions, especially unseen ones, and extends it to non-stereo datasets, achieving absolute improvements of up to 10.37% over baselines.
In this paper, a modification to the training process of the popular SPLICE algorithm has been proposed for noise robust speech recognition. The modification is based on feature correlations, and enables this stereo-based algorithm to improve the performance in all noise conditions, especially in unseen cases. Further, the modified framework is extended to work for non-stereo datasets where clean and noisy training utterances, but not stereo counterparts, are required. Finally, an MLLR-based computationally efficient run-time noise adaptation method in SPLICE framework has been proposed. The modified SPLICE shows 8.6% absolute improvement over SPLICE in Test C of Aurora-2 database, and 2.93% overall. Non-stereo method shows 10.37% and 6.93% absolute improvements over Aurora-2 and Aurora-4 baseline models respectively. Run-time adaptation shows 9.89% absolute improvement in modified framework as compared to SPLICE for Test C, and 4.96% overall w.r.t. standard MLLR adaptation on HMMs.