CVAIApr 22, 2022

Alleviating Representational Shift for Continual Fine-tuning

Peking U
arXiv:2204.10535v214 citationsh-index: 7
Originality Incremental advance
AI Analysis

This addresses catastrophic forgetting for practitioners fine-tuning pre-trained models continually, though it appears incremental as it builds on known issues of representational shift.

The paper tackles catastrophic forgetting in continual fine-tuning of pre-trained models by identifying that intermediate layers' representational shift disrupts batch normalization, and proposes ConFiT with cross-convolution batch normalization and hierarchical fine-tuning to address this. Experimental results on four datasets show the method outperforms state-of-the-art approaches with lower storage overhead.

We study a practical setting of continual learning: fine-tuning on a pre-trained model continually. Previous work has found that, when training on new tasks, the features (penultimate layer representations) of previous data will change, called representational shift. Besides the shift of features, we reveal that the intermediate layers' representational shift (IRS) also matters since it disrupts batch normalization, which is another crucial cause of catastrophic forgetting. Motivated by this, we propose ConFiT, a fine-tuning method incorporating two components, cross-convolution batch normalization (Xconv BN) and hierarchical fine-tuning. Xconv BN maintains pre-convolution running means instead of post-convolution, and recovers post-convolution ones before testing, which corrects the inaccurate estimates of means under IRS. Hierarchical fine-tuning leverages a multi-stage strategy to fine-tune the pre-trained network, preventing massive changes in Conv layers and thus alleviating IRS. Experimental results on four datasets show that our method remarkably outperforms several state-of-the-art methods with lower storage overhead.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes