CPT-Boosted Wav2vec2.0: Towards Noise Robust Speech Recognition for Classroom Environments
This work addresses noise robustness in ASR for classroom environments, aiding teachers and students, but it is incremental as it applies an existing adaptation technique to a specific domain.
The paper tackled the problem of making Automatic Speech Recognition (ASR) systems robust to classroom noise by using continued pretraining (CPT) to adapt Wav2vec2.0, resulting in a reduction in Word Error Rate (WER) by over 10%.
Creating Automatic Speech Recognition (ASR) systems that are robust and resilient to classroom conditions is paramount to the development of AI tools to aid teachers and students. In this work, we study the efficacy of continued pretraining (CPT) in adapting Wav2vec2.0 to the classroom domain. We show that CPT is a powerful tool in that regard and reduces the Word Error Rate (WER) of Wav2vec2.0-based models by upwards of 10%. More specifically, CPT improves the model's robustness to different noises, microphones and classroom conditions.