CL SD ASJan 4, 2024

PEFT for Speech: Unveiling Optimal Placement, Merging Strategies, and Ensemble Techniques

Tzu-Han Lin, How-Shing Wang, Hao-Yung Weng, Kuang-Chen Peng, Zih-Ching Chen, Hung-yi Lee

arXiv:2401.02122v23.411 citationsh-index: 42024 IEEE International Conference on Acoustics, Speech, and Signal Processing Workshops (ICASSPW)

Originality Synthesis-oriented

AI Analysis

This work addresses the optimization of PEFT methods for speech processing, but it is incremental as it builds on existing PEFT and ensemble techniques without introducing new paradigms.

The study tackled the problem of optimizing Parameter-Efficient Fine-Tuning (PEFT) methods in speech processing by comparing layer-wise placement using Differentiable Architecture Search (DARTS) and ensemble techniques, finding that an ensemble approach with majority voting outperformed DARTS and baseline methods.

Parameter-Efficient Fine-Tuning (PEFT) is increasingly recognized as an effective method in speech processing. However, the optimal approach and the placement of PEFT methods remain inconclusive. Our study conducts extensive experiments to compare different PEFT methods and their layer-wise placement adapting Differentiable Architecture Search (DARTS). We also explore the use of ensemble learning to leverage diverse PEFT strategies. The results reveal that DARTS does not outperform the baseline approach, which involves inserting the same PEFT method into all layers of a Self-Supervised Learning (SSL) model. In contrast, an ensemble learning approach, particularly one employing majority voting, demonstrates superior performance. Our statistical evidence indicates that different PEFT methods learn in varied ways. This variation might explain why the synergistic integration of various PEFT methods through ensemble learning can harness their unique learning capabilities more effectively compared to individual layer-wise optimization.

View on arXiv PDF

Similar