LG AIAug 12, 2023

SLoRA: Federated Parameter Efficient Fine-Tuning of Language Models

Sara Babakniya, Ahmed Roushdy Elkordy, Yahya H. Ezzeldin, Qingfeng Liu, Kee-Bong Song, Mostafa El-Khamy, Salman Avestimehr

arXiv:2308.06522v132.7133 citationsh-index: 33

Originality Incremental advance

AI Analysis

This addresses the problem of resource constraints and data diversity in federated learning for NLP, offering a practical solution for edge devices, though it is incremental as it builds on existing PEFT methods like LoRA.

The paper tackles the challenge of efficiently fine-tuning large language models in federated learning settings with heterogeneous data, proposing SLoRA to achieve performance comparable to full fine-tuning with only ~1% parameter updates and up to 90% training time reduction.

Transfer learning via fine-tuning pre-trained transformer models has gained significant success in delivering state-of-the-art results across various NLP tasks. In the absence of centralized data, Federated Learning (FL) can benefit from distributed and private data of the FL edge clients for fine-tuning. However, due to the limited communication, computation, and storage capabilities of edge devices and the huge sizes of popular transformer models, efficient fine-tuning is crucial to make federated training feasible. This work explores the opportunities and challenges associated with applying parameter efficient fine-tuning (PEFT) methods in different FL settings for language tasks. Specifically, our investigation reveals that as the data across users becomes more diverse, the gap between fully fine-tuning the model and employing PEFT methods widens. To bridge this performance gap, we propose a method called SLoRA, which overcomes the key limitations of LoRA in high heterogeneous data scenarios through a novel data-driven initialization technique. Our experimental results demonstrate that SLoRA achieves performance comparable to full fine-tuning, with significant sparse updates with approximately $\sim 1\%$ density while reducing training time by up to $90\%$.

View on arXiv PDF

Similar