Target Layer Regularization for Continual Learning Using Cramer-Wold Generator
This addresses catastrophic forgetting for neural networks in continual learning, but it appears incremental as it builds on existing regularization methods.
The paper tackles the problem of catastrophic forgetting in continual learning by proposing CW-TaLaR, a regularization strategy that uses Cramer-Wold distance to preserve target layer distributions without storing previous task data, achieving competitive results compared to state-of-the-art models.
We propose an effective regularization strategy (CW-TaLaR) for solving continual learning problems. It uses a penalizing term expressed by the Cramer-Wold distance between two probability distributions defined on a target layer of an underlying neural network that is shared by all tasks, and the simple architecture of the Cramer-Wold generator for modeling output data representation. Our strategy preserves target layer distribution while learning a new task but does not require remembering previous tasks' datasets. We perform experiments involving several common supervised frameworks, which prove the competitiveness of the CW-TaLaR method in comparison to a few existing state-of-the-art continual learning models.