LG CL SD ASOct 8, 2021

Exploring Heterogeneous Characteristics of Layers in ASR Models for More Efficient Training

Lillian Zhou, Dhruv Guliani, Andreas Kabel, Giovanni Motta, Françoise Beaufays

arXiv:2110.04267v23.11 citations

Originality Incremental advance

AI Analysis

This work addresses efficiency in training ASR models for speech recognition applications, though it appears incremental as it builds on existing layer importance research.

The paper analyzed layer importance in Conformer ASR models, identifying ambient layers and studying their stability across runs and model sizes, then applied these findings to Federated Learning by using Federated Dropout on important layers to reduce client model size without quality degradation.

Transformer-based architectures have been the subject of research aimed at understanding their overparameterization and the non-uniform importance of their layers. Applying these approaches to Automatic Speech Recognition, we demonstrate that the state-of-the-art Conformer models generally have multiple ambient layers. We study the stability of these layers across runs and model sizes, propose that group normalization may be used without disrupting their formation, and examine their correlation with model weight updates in each layer. Finally, we apply these findings to Federated Learning in order to improve the training procedure, by targeting Federated Dropout to layers by importance. This allows us to reduce the model size optimized by clients without quality degradation, and shows potential for future exploration.

View on arXiv PDF

Similar