GLoRIA: Gated Low-Rank Interpretable Adaptation for Dialectal ASR
This addresses the problem of dialectal ASR for speech recognition systems, offering an incremental improvement with efficient adaptation.
The paper tackled the challenge of automatic speech recognition in dialect-heavy settings by proposing GLoRIA, a parameter-efficient adaptation framework that uses metadata to modulate low-rank updates, achieving state-of-the-art word error rates while updating under 10% of parameters.
Automatic Speech Recognition (ASR) in dialect-heavy settings remains challenging due to strong regional variation and limited labeled data. We propose GLoRIA, a parameter-efficient adaptation framework that leverages metadata (e.g., coordinates) to modulate low-rank updates in a pre-trained encoder. GLoRIA injects low-rank matrices into each feed-forward layer, with a gating MLP determining the non-negative contribution of each LoRA rank-1 component based on location metadata. On the GCND corpus, GLoRIA outperforms geo-conditioned full fine-tuning, LoRA, and both dialect-specific and unified full fine-tuning, achieving state-of-the-art word error rates while updating under 10% of parameters. GLoRIA also generalizes well to unseen dialects, including in extrapolation scenarios, and enables interpretable adaptation patterns that can be visualized geospatially. These results show metadata-gated low-rank adaptation is an effective, interpretable, and efficient solution for dialectal ASR.