Unsupervised pretraining transfers well across languages
This work addresses the challenge of developing speech recognition systems for languages with limited labeled data, though it is incremental as it builds on existing contrastive predictive coding methods.
The paper tackled the problem of cross-lingual transfer in automatic speech recognition by investigating unsupervised pretraining with contrastive predictive coding, showing that a modified version achieves performance on par with or better than supervised methods, with potential benefits for low-resource languages.
Cross-lingual and multi-lingual training of Automatic Speech Recognition (ASR) has been extensively investigated in the supervised setting. This assumes the existence of a parallel corpus of speech and orthographic transcriptions. Recently, contrastive predictive coding (CPC) algorithms have been proposed to pretrain ASR systems with unlabelled data. In this work, we investigate whether unsupervised pretraining transfers well across languages. We show that a slight modification of the CPC pretraining extracts features that transfer well to other languages, being on par or even outperforming supervised pretraining. This shows the potential of unsupervised methods for languages with few linguistic resources.