LGAIFeb 6, 2024

On the Emergence of Cross-Task Linearity in the Pretraining-Finetuning Paradigm

arXiv:2402.03660v227 citationsh-index: 26ICML
Originality Incremental advance
AI Analysis

This finding offers insights into model merging and editing for researchers in deep learning, though it is incremental as it builds on the established pretraining-finetuning paradigm.

The paper discovers Cross-Task Linearity (CTL), where linearly interpolating weights of models finetuned from a common pretrained checkpoint results in features that approximate linear interpolation, providing empirical evidence for this phenomenon.

The pretraining-finetuning paradigm has become the prevailing trend in modern deep learning. In this work, we discover an intriguing linear phenomenon in models that are initialized from a common pretrained checkpoint and finetuned on different tasks, termed as Cross-Task Linearity (CTL). Specifically, we show that if we linearly interpolate the weights of two finetuned models, the features in the weight-interpolated model are often approximately equal to the linear interpolation of features in two finetuned models at each layer. We provide comprehensive empirical evidence supporting that CTL consistently occurs for finetuned models that start from the same pretrained checkpoint. We conjecture that in the pretraining-finetuning paradigm, neural networks approximately function as linear maps, mapping from the parameter space to the feature space. Based on this viewpoint, our study unveils novel insights into explaining model merging/editing, particularly by translating operations from the parameter space to the feature space. Furthermore, we delve deeper into the root cause for the emergence of CTL, highlighting the role of pretraining.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes