LGDec 17, 2025

Distillation-Guided Structural Transfer for Continual Learning Beyond Sparse Distributed Memory

arXiv:2512.15267v13 citationsh-index: 11
Originality Incremental advance
AI Analysis

This addresses the challenge of efficient continual learning in sparse neural architectures, offering a structurally grounded solution for researchers in machine learning, though it is incremental as it builds on existing sparse methods.

The paper tackled the problem of limited cross-task knowledge reuse and performance degradation in sparse neural systems for continual learning by proposing Selective Subnetwork Distillation (SSD), which improved accuracy, retention, and representation coverage on datasets like Split CIFAR-10, CIFAR-100, and MNIST.

Sparse neural systems are gaining traction for efficient continual learning due to their modularity and low interference. Architectures such as Sparse Distributed Memory Multi-Layer Perceptrons (SDMLP) construct task-specific subnetworks via Top-K activation and have shown resilience against catastrophic forgetting. However, their rigid modularity limits cross-task knowledge reuse and leads to performance degradation under high sparsity. We propose Selective Subnetwork Distillation (SSD), a structurally guided continual learning framework that treats distillation not as a regularizer but as a topology-aligned information conduit. SSD identifies neurons with high activation frequency and selectively distills knowledge within previous Top-K subnetworks and output logits, without requiring replay or task labels. This enables structural realignment while preserving sparse modularity. Experiments on Split CIFAR-10, CIFAR-100, and MNIST demonstrate that SSD improves accuracy, retention, and representation coverage, offering a structurally grounded solution for sparse continual learning.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes