CLAIFeb 25, 2024

LoRA Meets Dropout under a Unified Framework

Oxford
arXiv:2403.00812v230 citationsh-index: 7ACL
Originality Incremental advance
AI Analysis

This work addresses a practical problem for researchers and practitioners in NLP by improving parameter-efficient fine-tuning of large language models, though it is incremental as it builds on existing dropout and LoRA techniques.

The paper tackles the contradiction between parameter-efficient fine-tuning (LoRA) and existing dropout methods, showing that LoRA is also prone to overfitting and introducing a unified framework to analyze dropout methods, leading to a new method called HiddenKey that achieves superior performance across models and tasks.

With the remarkable capabilities, large language models (LLMs) have emerged as essential elements in numerous NLP applications, while parameter-efficient finetuning, especially LoRA, has gained popularity as a lightweight approach for model customization. Meanwhile, various dropout methods, initially designed for full finetuning with all the parameters updated, alleviates overfitting associated with excessive parameter redundancy. Hence, a possible contradiction arises from negligible trainable parameters of LoRA and the effectiveness of previous dropout methods, which has been largely overlooked. To fill this gap, we first confirm that parameter-efficient LoRA is also overfitting-prone. We then revisit transformer-specific dropout methods, and establish their equivalence and distinctions mathematically and empirically. Building upon this comparative analysis, we introduce a unified framework for a comprehensive investigation, which instantiates these methods based on dropping position, structural pattern and compensation measure. Through this framework, we reveal the new preferences and performance comparisons of them when involved with limited trainable parameters. This framework also allows us to amalgamate the most favorable aspects into a novel dropout method named HiddenKey. Extensive experiments verify the remarkable superiority and sufficiency of HiddenKey across multiple models and tasks, which highlights it as the preferred approach for high-performance and parameter-efficient finetuning of LLMs.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes