AS AI CLApr 24, 2024

Gated Low-rank Adaptation for personalized Code-Switching Automatic Speech Recognition on the low-spec devices

Gwantae Kim, Bokyeung Lee, Donghyeon Kim, Hanseok Ko

arXiv:2406.02562v14.33 citationsh-index: 132024 IEEE International Conference on Acoustics, Speech, and Signal Processing Workshops (ICASSPW)

Originality Incremental advance

AI Analysis

It addresses the need for efficient, personalized code-switching ASR on low-spec devices, representing an incremental advancement in parameter-efficient fine-tuning.

This paper tackles the problem of inefficient personalized large models on low-spec devices and limited code-switching recognition in automatic speech recognition by proposing a weights separation method and a gated low-rank adaptation (GLoRA) for parameter-efficient fine-tuning, achieving performance improvements over traditional models and conventional LoRA on Korean-English datasets.

In recent times, there has been a growing interest in utilizing personalized large models on low-spec devices, such as mobile and CPU-only devices. However, utilizing a personalized large model in the on-device is inefficient, and sometimes limited due to computational cost. To tackle the problem, this paper presents the weights separation method to minimize on-device model weights using parameter-efficient fine-tuning methods. Moreover, some people speak multiple languages in an utterance, as known as code-switching, the personalized ASR model is necessary to address such cases. However, current multilingual speech recognition models are limited to recognizing a single language within each utterance. To tackle this problem, we propose code-switching speech recognition models that incorporate fine-tuned monolingual and multilingual speech recognition models. Additionally, we introduce a gated low-rank adaptation(GLoRA) for parameter-efficient fine-tuning with minimal performance degradation. Our experiments, conducted on Korean-English code-switching datasets, demonstrate that fine-tuning speech recognition models for code-switching surpasses the performance of traditional code-switching speech recognition models trained from scratch. Furthermore, GLoRA enhances parameter-efficient fine-tuning performance compared to conventional LoRA.

View on arXiv PDF

Similar