CVDec 3, 2024

Mixture of Physical Priors Adapter for Parameter-Efficient Fine-Tuning

Zhaozhi Wang, Conghu Li, Qixiang Ye, Tong Zhang

arXiv:2412.02759v13.71 citationsh-index: 5

Originality Highly original

AI Analysis

This addresses the problem of oversimplified representations in PEFT for vision tasks, offering a more effective and adaptable fine-tuning method.

The paper tackles the limitation of low-rank representations in parameter-efficient fine-tuning (PEFT) by proposing a Mixture of Physical Priors Adapter (MoPPA) that uses physical equations to model network weights, resulting in up to 2.1% accuracy improvement on VTAB-1K image classification with comparable trainable parameters.

Most parameter-efficient fine-tuning (PEFT) methods rely on low-rank representations to adapt models. However, these approaches often oversimplify representations, particularly when the underlying data has high-rank or high-frequency components. This limitation hinders the model's ability to capture complex data interactions effectively. In this paper, we propose a novel approach that models network weights by leveraging a combination of physical priors, enabling more accurate approximations. We use three foundational equations -- heat diffusion, wave propagation, and Poisson's steady-state equation -- each contributing distinctive modeling properties: heat diffusion enforces local smoothness, wave propagation facilitates long-range interactions, and Poisson's equation captures global equilibrium. To combine these priors effectively, we introduce the Mixture of Physical Priors Adapter (MoPPA), using an efficient Discrete Cosine Transform (DCT) implementation. To dynamically balance these priors, a route regularization mechanism is designed to adaptively tune their contributions. MoPPA serves as a lightweight, plug-and-play module that seamlessly integrates into transformer architectures, with adaptable complexity depending on the local context. Specifically, using MAE pre-trained ViT-B, MoPPA improves PEFT accuracy by up to 2.1% on VTAB-1K image classification with a comparable number of trainable parameters, and advantages are further validated through experiments across various vision backbones, showcasing MoPPA's effectiveness and adaptability. The code will be made public available.

View on arXiv PDF

Similar