Cantonese Automatic Speech Recognition Using Transfer Learning from Mandarin
This work addresses speech recognition for Cantonese speakers, but it is incremental as it applies an existing transfer learning method to a new language pair.
The paper tackled the problem of developing an automatic speech recognizer for low-resource Cantonese by using transfer learning from high-resource Mandarin, resulting in quicker training times and slight improvements in character error rate (CER).
We propose a system to develop a basic automatic speech recognizer(ASR) for Cantonese, a low-resource language, through transfer learning of Mandarin, a high-resource language. We take a time-delayed neural network trained on Mandarin, and perform weight transfer of several layers to a newly initialized model for Cantonese. We experiment with the number of layers transferred, their learning rates, and pretraining i-vectors. Key findings are that this approach allows for quicker training time with less data. We find that for every epoch, log-probability is smaller for transfer learning models compared to a Cantonese-only model. The transfer learning models show slight improvement in CER.