LGAIDec 8, 2025

Asymptotic analysis of shallow and deep forgetting in replay with Neural Collapse

arXiv:2512.07400v11 citationsh-index: 1
Originality Incremental advance
AI Analysis

This work addresses the problem of inefficient buffer usage in continual learning for AI researchers, offering insights that could reduce reliance on large buffers, though it appears incremental in refining existing replay methods.

The paper tackles the paradox in continual learning where neural networks retain linearly separable representations of past tasks despite prediction failures, formalizing this as a gap between deep feature-space and shallow classifier-level forgetting. It reveals that minimal buffers in Experience Replay prevent deep forgetting, but mitigating shallow forgetting requires larger capacities, with analysis showing that any non-zero replay fraction asymptotically guarantees linear separability retention.

A persistent paradox in continual learning (CL) is that neural networks often retain linearly separable representations of past tasks even when their output predictions fail. We formalize this distinction as the gap between deep feature-space and shallow classifier-level forgetting. We reveal a critical asymmetry in Experience Replay: while minimal buffers successfully anchor feature geometry and prevent deep forgetting, mitigating shallow forgetting typically requires substantially larger buffer capacities. To explain this, we extend the Neural Collapse framework to the sequential setting. We characterize deep forgetting as a geometric drift toward out-of-distribution subspaces and prove that any non-zero replay fraction asymptotically guarantees the retention of linear separability. Conversely, we identify that the "strong collapse" induced by small buffers leads to rank-deficient covariances and inflated class means, effectively blinding the classifier to true population boundaries. By unifying CL with out-of-distribution detection, our work challenges the prevailing reliance on large buffers, suggesting that explicitly correcting these statistical artifacts could unlock robust performance with minimal replay.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes