MiSS: Revisiting the Trade-off in LoRA with an Efficient Shard-Sharing Structure
This work addresses the challenge of balancing performance, memory, and computational efficiency in parameter-efficient fine-tuning for machine learning practitioners, representing an incremental improvement over existing LoRA variants.
The paper tackles the slow convergence and trade-offs in Low-Rank Adaptation (LoRA) for parameter-efficient fine-tuning by proposing MiSS, which uses a shared trainable matrix to update weight shards, achieving reduced optimization complexity without performance loss and positioning favorably on the Pareto frontier for memory, overhead, and efficiency.
Low-Rank Adaptation (LoRA) is a widely adopted technique for parameter-efficient fine-tuning, but its slow convergence has spurred the development of numerous variants. Nevertheless, existing methods often fail to improve performance, memory footprint, and computational efficiency simultaneously. To address this challenge, we revisit the causes of LoRA's slow convergence. Building on these insights, we propose Matrix Shard Sharing (MiSS), which updates shards of the original weight matrix using a single shared trainable matrix $\boldsymbol{D}$, initialized to zeros. To simultaneously ensure computational efficiency, low memory footprint, and scalable serving, we introduce MiSS$^e$. Both theoretical analysis and empirical results demonstrate that our method reduces optimization complexity without compromising performance, thereby achieving a more favorable trade-off among performance, memory, and efficiency. Furthermore, we conduct a comprehensive comparative analysis of various PEFT methods, evaluating their memory usage, initialization overhead, and computational efficiency. By mapping the Pareto frontier across these dimensions, we show that MiSS occupies a favorable position, effectively capturing the advantages of prior approaches.