Improving Pseudo-label Training For End-to-end Speech Recognition Using Gradient Mask
This work addresses semi-supervised speech recognition for researchers, but it is incremental as it builds on existing pseudo-labeling and self-supervised learning methods.
The paper tackles the problem of improving pseudo-label training for end-to-end speech recognition by using a Gradient Mask method, which forces the model to predict from masked inputs to learn robust acoustic representations without extra loss functions, achieving competitive results on Librispeech 100 hours experiments.
In the recent trend of semi-supervised speech recognition, both self-supervised representation learning and pseudo-labeling have shown promising results. In this paper, we propose a novel approach to combine their ideas for end-to-end speech recognition model. Without any extra loss function, we utilize the Gradient Mask to optimize the model when training on pseudo-label. This method forces the speech recognition model to predict from the masked input to learn strong acoustic representation and make training robust to label noise. In our semi-supervised experiments, the method can improve the model performance when training on pseudo-label and our method achieved competitive results comparing with other semi-supervised approaches on the Librispeech 100 hours experiments.