Preserving Linear Separability in Continual Learning by Backward Feature Projection
This work addresses the stability-plasticity trade-off in continual learning for AI systems that need to learn sequentially without forgetting, offering an incremental improvement over existing feature distillation methods.
The paper tackles catastrophic forgetting in continual learning by proposing Backward Feature Projection (BFP), which allows new features to change via a learnable linear transformation of old features, preserving linear separability for old classes while accommodating new ones. The method integrates with experience replay and boosts performance significantly, as demonstrated by high classification accuracy in linear probing.
Catastrophic forgetting has been a major challenge in continual learning, where the model needs to learn new tasks with limited or no access to data from previously seen tasks. To tackle this challenge, methods based on knowledge distillation in feature space have been proposed and shown to reduce forgetting. However, most feature distillation methods directly constrain the new features to match the old ones, overlooking the need for plasticity. To achieve a better stability-plasticity trade-off, we propose Backward Feature Projection (BFP), a method for continual learning that allows the new features to change up to a learnable linear transformation of the old features. BFP preserves the linear separability of the old classes while allowing the emergence of new feature directions to accommodate new classes. BFP can be integrated with existing experience replay methods and boost performance by a significant margin. We also demonstrate that BFP helps learn a better representation space, in which linear separability is well preserved during continual learning and linear probing achieves high classification accuracy. The code can be found at https://github.com/rvl-lab-utoronto/BFP