Lattice-Free MMI Adaptation Of Self-Supervised Pretrained Acoustic Models
This work provides significant WER improvements for speech recognition systems by effectively adapting self-supervised pretrained acoustic models, which is beneficial for researchers and practitioners working on ASR.
This paper proposes lattice-free MMI (LFMMI) for supervised adaptation of self-supervised pretrained acoustic models. They show that fine-tuning with LFMMI consistently yields relative WER improvements of 10% and 35.3% on Librispeech (100h) clean and other test sets, 10.8% on Switchboard (300h), and 4.3% on Swahili (38h) and 4.4% on Tagalog (84h) compared to a supervised baseline.
In this work, we propose lattice-free MMI (LFMMI) for supervised adaptation of self-supervised pretrained acoustic model. We pretrain a Transformer model on thousand hours of untranscribed Librispeech data followed by supervised adaptation with LFMMI on three different datasets. Our results show that fine-tuning with LFMMI, we consistently obtain relative WER improvements of 10% and 35.3% on the clean and other test sets of Librispeech (100h), 10.8% on Switchboard (300h), and 4.3% on Swahili (38h) and 4.4% on Tagalog (84h) compared to the baseline trained only with supervised data.