LGAIMay 28, 2023

Efficient Storage of Fine-Tuned Models via Low-Rank Approximation of Weight Residuals

arXiv:2305.18425v114 citations
Originality Incremental advance
AI Analysis

This addresses storage efficiency for users of large fine-tuned models, but it is incremental as it builds on existing low-rank and quantization techniques.

The paper tackles the problem of storing fine-tuned models efficiently by using low-rank approximation of weight residuals, achieving significant memory footprint reduction while preserving performance across tasks and modalities.

In this paper, we present an efficient method for storing fine-tuned models by leveraging the low-rank properties of weight residuals. Our key observation is that weight residuals in large overparameterized models exhibit even stronger low-rank characteristics. Based on this insight, we propose Efficient Residual Encoding (ERE), a novel approach that achieves efficient storage of fine-tuned model weights by approximating the low-rank weight residuals. Furthermore, we analyze the robustness of weight residuals and push the limit of storage efficiency by utilizing additional quantization and layer-wise rank allocation. Our experimental results demonstrate that our method significantly reduces memory footprint while preserving performance in various tasks and modalities. We release our code.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes