LG CVDec 13, 2023

Read Between the Layers: Leveraging Multi-Layer Representations for Rehearsal-Free Continual Learning with Pre-Trained Models

Kyra Ahrens, Hans Hergen Lehmann, Jae Hee Lee, Stefan Wermter

arXiv:2312.08888v39.89 citationsh-index: 7Has CodeTrans. Mach. Learn. Res.

Originality Highly original

AI Analysis

This work addresses continual learning for AI systems that need to adapt to new tasks without forgetting, offering a more efficient and effective approach by leveraging intermediate layers of pre-trained models.

The paper tackled the problem of continual learning by proposing LayUP, a prototype-based method that uses multi-layer representations from pre-trained models to improve performance without rehearsal, achieving state-of-the-art results in 13 out of 17 benchmarks and reducing memory and computational requirements.

We address the Continual Learning (CL) problem, wherein a model must learn a sequence of tasks from non-stationary distributions while preserving prior knowledge upon encountering new experiences. With the advancement of foundation models, CL research has pivoted from the initial learning-from-scratch paradigm towards utilizing generic features from large-scale pre-training. However, existing approaches to CL with pre-trained models primarily focus on separating class-specific features from the final representation layer and neglect the potential of intermediate representations to capture low- and mid-level features, which are more invariant to domain shifts. In this work, we propose LayUP, a new prototype-based approach to CL that leverages second-order feature statistics from multiple intermediate layers of a pre-trained network. Our method is conceptually simple, does not require access to prior data, and works out of the box with any foundation model. LayUP surpasses the state of the art in four of the seven class-incremental learning benchmarks, all three domain-incremental learning benchmarks and in six of the seven online continual learning benchmarks, while significantly reducing memory and computational requirements compared to existing baselines. Our results demonstrate that fully exhausting the representational capacities of pre-trained models in CL goes well beyond their final embeddings.

View on arXiv PDF Code

Similar