CVAIJun 13, 2024

PC-LoRA: Low-Rank Adaptation for Progressive Model Compression with Knowledge Distillation

arXiv:2406.09117v18 citations
Originality Incremental advance
AI Analysis

This addresses the need for efficient deployment of large models by enabling simultaneous compression and fine-tuning, though it is incremental as it builds on existing LoRA methods.

The paper tackles the problem of model compression and fine-tuning by introducing PC-LoRA, which gradually removes pre-trained weights to leave only low-rank adapters, achieving compression rates of up to 94.36% for parameters and 89.1% for FLOPs in vision models like ViT-B.

Low-rank adaption (LoRA) is a prominent method that adds a small number of learnable parameters to the frozen pre-trained weights for parameter-efficient fine-tuning. Prompted by the question, ``Can we make its representation enough with LoRA weights solely at the final phase of finetuning without the pre-trained weights?'' In this work, we introduce Progressive Compression LoRA~(PC-LoRA), which utilizes low-rank adaptation (LoRA) to simultaneously perform model compression and fine-tuning. The PC-LoRA method gradually removes the pre-trained weights during the training process, eventually leaving only the low-rank adapters in the end. Thus, these low-rank adapters replace the whole pre-trained weights, achieving the goals of compression and fine-tuning at the same time. Empirical analysis across various models demonstrates that PC-LoRA achieves parameter and FLOPs compression rates of 94.36%/89.1% for vision models, e.g., ViT-B, and 93.42%/84.2% parameters and FLOPs compressions for language models, e.g., BERT.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes