Exploring Representation Invariance in Finetuning
This addresses a key issue in adapting foundation models to low-resource tasks, though it is incremental as it builds on existing finetuning methods.
The paper tackles the problem of pretrained representations vanishing during finetuning, which degrades model generalizability, and introduces RIFT, a regularization method that preserves these representations, resulting in competitive or enhanced performance and better generalizability.
Foundation models pretrained on large-scale natural images are widely adapted to various cross-domain low-resource downstream tasks, benefiting from generalizable and transferable patterns captured by their representations. However, these representations are later found to gradually vanish during finetuning, accompanied by a degradation of model's original generalizability. In this paper, we argue that such tasks can be effectively adapted without sacrificing the benefits of pretrained representations. We approach this by introducing \textit{Representation Invariance FineTuning (RIFT)}, a regularization that maximizes the representation similarity between pretrained and finetuned models by leveraging orthogonal invariance of manifolds in a computationally efficient way. Experiments demonstrate that our method is compatible with mainstream finetuning methods, offering competitive or even enhanced performance and better preservation of the generalizability.