Multi-Accent Adaptation based on Gate Mechanism
This addresses the challenge of accent variability in speech recognition systems, offering a simplified adaptation approach, though it is incremental as it builds on existing adaptation methods.
The paper tackles the problem of improving multi-accent speech recognition with limited accented data by proposing a gate mechanism for simultaneous adaptation to multiple accents, achieving a 9.8% average relative WER reduction over the baseline and 1.9% over accent-specific adaptation, and a 5.1% reduction when using an accent classifier without prior labels.
When only a limited amount of accented speech data is available, to promote multi-accent speech recognition performance, the conventional approach is accent-specific adaptation, which adapts the baseline model to multiple target accents independently. To simplify the adaptation procedure, we explore adapting the baseline model to multiple target accents simultaneously with multi-accent mixed data. Thus, we propose using accent-specific top layer with gate mechanism (AST-G) to realize multi-accent adaptation. Compared with the baseline model and accent-specific adaptation, AST-G achieves 9.8% and 1.9% average relative WER reduction respectively. However, in real-world applications, we can't obtain the accent category label for inference in advance. Therefore, we apply using an accent classifier to predict the accent label. To jointly train the acoustic model and the accent classifier, we propose the multi-task learning with gate mechanism (MTL-G). As the accent label prediction could be inaccurate, it performs worse than the accent-specific adaptation. Yet, in comparison with the baseline model, MTL-G achieves 5.1% average relative WER reduction.