LGIVMar 25, 2025

Continual Learning With Quasi-Newton Methods

arXiv:2503.19939v12 citationsh-index: 6IEEE Access
Originality Incremental advance
AI Analysis

This addresses the problem of catastrophic forgetting for AI systems that learn tasks sequentially, offering an incremental improvement over existing methods like EWC.

The paper tackles catastrophic forgetting in neural networks by introducing CSQN, a method that uses Quasi-Newton approximations for more accurate Hessian estimates, reducing EWC's forgetting by 50% and improving performance by 8% on average across benchmarks.

Catastrophic forgetting remains a major challenge when neural networks learn tasks sequentially. Elastic Weight Consolidation (EWC) attempts to address this problem by introducing a Bayesian-inspired regularization loss to preserve knowledge of previously learned tasks. However, EWC relies on a Laplace approximation where the Hessian is simplified to the diagonal of the Fisher information matrix, assuming uncorrelated model parameters. This overly simplistic assumption often leads to poor Hessian estimates, limiting its effectiveness. To overcome this limitation, we introduce Continual Learning with Sampled Quasi-Newton (CSQN), which leverages Quasi-Newton methods to compute more accurate Hessian approximations. CSQN captures parameter interactions beyond the diagonal without requiring architecture-specific modifications, making it applicable across diverse tasks and architectures. Experimental results across four benchmarks demonstrate that CSQN consistently outperforms EWC and other state-of-the-art baselines, including rehearsal-based methods. CSQN reduces EWC's forgetting by 50 percent and improves its performance by 8 percent on average. Notably, CSQN achieves superior results on three out of four benchmarks, including the most challenging scenarios, highlighting its potential as a robust solution for continual learning.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes