LGAIFeb 10, 2025

Right Time to Learn:Promoting Generalization via Bio-inspired Spacing Effect in Knowledge Distillation

arXiv:2502.06192v22 citationsh-index: 14Has CodeICML
Originality Highly original
AI Analysis

This work addresses the problem of promoting generalization in deep neural networks for machine learning practitioners and researchers, providing an incremental improvement over existing knowledge distillation methods.

The authors tackled the problem of improving knowledge distillation in deep neural networks and achieved a performance gain of up to 2.31% and 3.34% on Tiny-ImageNet over online KD and self KD, respectively. This was done by introducing a bio-inspired spacing effect into the knowledge distillation process.

Knowledge distillation (KD) is a powerful strategy for training deep neural networks (DNNs). Although it was originally proposed to train a more compact "student" model from a large "teacher" model, many recent efforts have focused on adapting it to promote generalization of the model itself, such as online KD and self KD. Here, we propose an accessible and compatible strategy named Spaced KD to improve the effectiveness of both online KD and self KD, in which the student model distills knowledge from a teacher model trained with a space interval ahead. This strategy is inspired by a prominent theory named spacing effect in biological learning and memory, positing that appropriate intervals between learning trials can significantly enhance learning performance. With both theoretical and empirical analyses, we demonstrate that the benefits of the proposed Spaced KD stem from convergence to a flatter loss landscape during stochastic gradient descent (SGD). We perform extensive experiments to validate the effectiveness of Spaced KD in improving the learning performance of DNNs (e.g., the performance gain is up to 2.31% and 3.34% on Tiny-ImageNet over online KD and self KD, respectively). Our codes have been released on github https://github.com/SunGL001/Spaced-KD.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes