Learning to adapt: a meta-learning approach for speaker adaptation
This addresses the challenge of improving speech recognition accuracy for unseen speakers, but it is incremental as it builds on existing adaptation methods.
The paper tackles the problem of speaker adaptation in automatic speech recognition by using meta-learning to adapt all weights of an acoustic model, resulting in outperforming a strong baseline adapting LHUC parameters for a DNN AM with 1.5M parameters and achieving comparable performance for TDNN AMs.
The performance of automatic speech recognition systems can be improved by adapting an acoustic model to compensate for the mismatch between training and testing conditions, for example by adapting to unseen speakers. The success of speaker adaptation methods relies on selecting weights that are suitable for adaptation and using good adaptation schedules to update these weights in order not to overfit to the adaptation data. In this paper we investigate a principled way of adapting all the weights of the acoustic model using a meta-learning. We show that the meta-learner can learn to perform supervised and unsupervised speaker adaptation and that it outperforms a strong baseline adapting LHUC parameters when adapting a DNN AM with 1.5M parameters. We also report initial experiments on adapting TDNN AMs, where the meta-learner achieves comparable performance with LHUC.