AS LG SDJan 27, 2025

UniPET-SPK: A Unified Framework for Parameter-Efficient Tuning of Pre-trained Speech Models for Robust Speaker Verification

arXiv:2501.16542v13.34 citationsh-index: 6IEEE Transactions on Audio, Speech, and Language Processing

Originality Incremental advance

AI Analysis

This work addresses the computational and overfitting issues in fine-tuning large speech models for speaker verification, offering an incremental improvement in parameter efficiency.

The study tackled the challenge of adapting large pre-trained speech models to speaker verification efficiently by proposing a unified parameter-efficient tuning framework, UniPET-SPK, which outperformed fine-tuning and other methods while updating only 5.4% of parameters.

With excellent generalization ability, SSL speech models have shown impressive performance on various downstream tasks in the pre-training and fine-tuning paradigm. However, as the size of pre-trained models grows, fine-tuning becomes practically unfeasible due to expanding computation and storage requirements and the risk of overfitting. This study explores parameter-efficient tuning (PET) methods for adapting large-scale pre-trained SSL speech models to speaker verification task. Correspondingly, we propose three PET methods: (i)an adapter-tuning method, (ii)a prompt-tuning method, and (iii)a unified framework that effectively incorporates adapter-tuning and prompt-tuning with a dynamically learnable gating mechanism. First, we propose the Inner+Inter Adapter framework, which inserts two types of adapters into pre-trained models, allowing for adaptation of latent features within the intermediate Transformer layers and output embeddings from all Transformer layers, through a parallel adapter design. Second, we propose the Deep Speaker Prompting method that concatenates trainable prompt tokens into the input space of pre-trained models to guide adaptation. Lastly, we propose the UniPET-SPK, a unified framework that effectively incorporates these two alternate PET methods into a single framework with a dynamic trainable gating mechanism. The proposed UniPET-SPK learns to find the optimal mixture of PET methods to match different datasets and scenarios. We conduct a comprehensive set of experiments on several datasets to validate the effectiveness of the proposed PET methods. Experimental results on VoxCeleb, CN-Celeb, and 1st 48-UTD forensic datasets demonstrate that the proposed UniPET-SPK consistently outperforms the two PET methods, fine-tuning, and other parameter-efficient tuning methods, achieving superior performance while updating only 5.4% of the parameters.

View on arXiv PDF

Similar