LGAICLCVNov 10, 2023

Parameter-Efficient Orthogonal Finetuning via Butterfly Factorization

arXiv:2311.06243v2116 citationsh-index: 19
AI Analysis

This work addresses the high computational cost of finetuning large models, which is a critical issue for researchers and practitioners in AI, though it is incremental as it builds on existing Orthogonal Finetuning.

The paper tackles the problem of efficiently adapting large foundation models to downstream tasks by proposing Orthogonal Butterfly (BOFT), a parameter-efficient finetuning method that reduces trainable parameters while maintaining performance, achieving competitive results across vision and language tasks.

Large foundation models are becoming ubiquitous, but training them from scratch is prohibitively expensive. Thus, efficiently adapting these powerful models to downstream tasks is increasingly important. In this paper, we study a principled finetuning paradigm -- Orthogonal Finetuning (OFT) -- for downstream task adaptation. Despite demonstrating good generalizability, OFT still uses a fairly large number of trainable parameters due to the high dimensionality of orthogonal matrices. To address this, we start by examining OFT from an information transmission perspective, and then identify a few key desiderata that enable better parameter-efficiency. Inspired by how the Cooley-Tukey fast Fourier transform algorithm enables efficient information transmission, we propose an efficient orthogonal parameterization using butterfly structures. We apply this parameterization to OFT, creating a novel parameter-efficient finetuning method, called Orthogonal Butterfly (BOFT). By subsuming OFT as a special case, BOFT introduces a generalized orthogonal finetuning framework. Finally, we conduct an extensive empirical study of adapting large vision transformers, large language models, and text-to-image diffusion models to various downstream tasks in vision and language.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes