CLNov 15, 2023

SiRA: Sparse Mixture of Low Rank Adaptation

Yun Zhu, Nevan Wichers, Chu-Cheng Lin, Xinyi Wang, Tianlong Chen, Lei Shu, Han Lu, Canoee Liu, Liangchen Luo, Jindong Chen, Lei Meng

arXiv:2311.09179v18.540 citationsh-index: 10

Originality Incremental advance

AI Analysis

This work addresses the need for more effective adaptation of large language models to downstream tasks, representing an incremental improvement over existing parameter-efficient tuning methods.

The authors tackled the problem of parameter-efficient tuning for large language models by proposing SiRA, a sparse mixture of low-rank adaptation that improves performance over LoRA and other mixture-of-expert methods in single-task and multitask settings.

Parameter Efficient Tuning has been an prominent approach to adapt the Large Language Model to downstream tasks. Most previous works considers adding the dense trainable parameters, where all parameters are used to adapt certain task. We found this less effective empirically using the example of LoRA that introducing more trainable parameters does not help. Motivated by this we investigate the importance of leveraging "sparse" computation and propose SiRA: sparse mixture of low rank adaption. SiRA leverages the Sparse Mixture of Expert(SMoE) to boost the performance of LoRA. Specifically it enforces the top $k$ experts routing with a capacity limit restricting the maximum number of tokens each expert can process. We propose a novel and simple expert dropout on top of gating network to reduce the over-fitting issue. Through extensive experiments, we verify SiRA performs better than LoRA and other mixture of expert approaches across different single tasks and multitask settings.

View on arXiv PDF

Similar