AS CL SPOct 4, 2023

Prompting and Adapter Tuning for Self-supervised Encoder-Decoder Speech Model

Kai-Wei Chang, Ming-Hsin Chen, Yun-Ping Lin, Jing Neng Hsu, Paul Kuo-Ming Huang, Chien-yu Huang, Shang-Wen Li, Hung-yi Lee

Meta AIMIT

arXiv:2310.02971v34.37 citationsh-index: 31

Originality Incremental advance

AI Analysis

This work addresses efficient adaptation methods for speech models in low-resource and cross-lingual scenarios, showing incremental improvements over existing techniques.

The paper tackled the problem of applying prompting and adapter tuning to self-supervised encoder-decoder speech models for complex sequence generation tasks, achieving a 53% relative improvement in word error rate for ASR and a 27% improvement in F1 score for slot filling.

Prompting and adapter tuning have emerged as efficient alternatives to fine-tuning (FT) methods. However, existing studies on speech prompting focused on classification tasks and failed on more complex sequence generation tasks. Besides, adapter tuning is primarily applied with a focus on encoder-only self-supervised models. Our experiments show that prompting on Wav2Seq, a self-supervised encoder-decoder model, surpasses previous works in sequence generation tasks. It achieves a remarkable 53% relative improvement in word error rate for ASR and a 27% in F1 score for slot filling. Additionally, prompting competes with the FT method in the low-resource scenario. Moreover, we show the transferability of prompting and adapter tuning on Wav2Seq in cross-lingual ASR. When limited trainable parameters are involved, prompting and adapter tuning consistently outperform conventional FT across 7 languages. Notably, in the low-resource scenario, prompting consistently outperforms adapter tuning.

View on arXiv PDF

Similar