LG AISep 28, 2023

Low-redundancy Distillation for Continual Learning

RuiQi Liu, Boyu Diao, Libo Huang, Zijia An, Hangda Liu, Zhulin An, Yongjun Xu

arXiv:2309.16117v25.37 citationsh-index: 23Has Code

Originality Incremental advance

AI Analysis

This work addresses the practical application challenge of continual learning by improving training efficiency, though it is incremental as it builds on existing distillation and rehearsal methods.

The paper tackles the problem of training efficiency in continual learning, which is often neglected in favor of accuracy, by proposing Low-redundancy Distillation (LoRD) to reduce redundancy in models and rehearsal samples, achieving the highest accuracy with the lowest training FLOPs across various benchmarks.

Continual learning (CL) aims to learn new tasks without erasing previous knowledge. However, current CL methods primarily emphasize improving accuracy while often neglecting training efficiency, which consequently restricts their practical application. Drawing inspiration from the brain's contextual gating mechanism, which selectively filters neural information and continuously updates past memories, we propose Low-redundancy Distillation (LoRD), a novel CL method that enhances model performance while maintaining training efficiency. This is achieved by eliminating redundancy in three aspects of CL: student model redundancy, teacher model redundancy, and rehearsal sample redundancy. By compressing the learnable parameters of the student model and pruning the teacher model, LoRD facilitates the retention and optimization of prior knowledge, effectively decoupling task-specific knowledge without manually assigning isolated parameters for each task. Furthermore, we optimize the selection of rehearsal samples and refine rehearsal frequency to improve training efficiency. Through a meticulous design of distillation and rehearsal strategies, LoRD effectively balances training efficiency and model precision. Extensive experimentation across various benchmark datasets and environments demonstrates LoRD's superiority, achieving the highest accuracy with the lowest training FLOPs.

View on arXiv PDF Code

Similar