Injecting Text and Cross-lingual Supervision in Few-shot Learning from Self-Supervised Models
This work addresses the challenge of adapting self-supervised models to new languages with limited data, which is incremental as it builds on existing methods by incorporating additional resources.
The paper tackled the problem of improving few-shot learning performance for low-resource languages by leveraging cross-lingual supervision and target-language text with the LF-MMI objective, resulting in great improvements in three low-resource languages.
Self-supervised model pre-training has recently garnered significant interest, but relatively few efforts have explored using additional resources in fine-tuning these models. We demonstrate how universal phoneset acoustic models can leverage cross-lingual supervision to improve transfer of pretrained self-supervised representations to new languages. We also show how target-language text can be used to enable and improve fine-tuning with the lattice-free maximum mutual information (LF-MMI) objective. In three low-resource languages these techniques greatly improved few-shot learning performance.