LGAINov 26, 2024

CLOVER: Cross-Layer Orthogonal Vectors Pruning and Fine-Tuning

arXiv:2411.17426v34 citationsh-index: 6
Originality Highly original
AI Analysis

This addresses memory efficiency issues in large language models for AI practitioners, offering a novel approach to pruning and fine-tuning with strong specific gains.

The paper tackles the memory-bound inference problem in decoder-only models by introducing CLOVER, a method that applies SVD to attention layer pairs for pruning or fine-tuning, achieving significant improvements such as pruning 70% of Q-K pairs in GPT-2 XL with perplexity similar to pruning 8% using vanilla methods and outperforming state-of-the-art fine-tuning methods by up to 7.6% on commonsense tasks for LLaMA-2 7B.

Decoder-only models generate tokens autoregressively by caching key/value vectors, but as the cache grows, inference becomes memory-bound. To address this issue, we introduce CLOVER (Cross-Layer Orthogonal Vectors), a novel approach that treats pairs of attention layers as a set of low-rank decompositions. CLOVER applies Singular Value Decomposition (SVD) to the \( Q \)-\( K \) and \( V \)-\( O \) pairs within each attention head. The resulting singular values can either guide pruning or serve as trainable parameters for efficient fine-tuning of all orthogonal vectors. After pruning or fine-tuning, these values are reintegrated into the model without increasing its parameter count. We apply CLOVER to various models, including GPT-2 XL, DeepSeek-V2-Lite, Whisper-Large-v3, Stable Diffusion XL, and LLaMA-3.2-11B-Vision. Our results demonstrate that CLOVER significantly improves pruning efficiency. For instance, the perplexity of pruning 70\% of the \( Q \)-\( K \) pairs in GPT-2 XL is similar to that of pruning just 8\% with vanilla methods. Fine-tuning the singular values further results in a full-rank update, outperforming state-of-the-art methods (LoRA, DoRA, HiRA, and PiSSA) by 7.6\%, 5.5\%, 3.8\%, and 0.7\%, respectively, on eight commonsense tasks for LLaMA-2 7B.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes