Noise as a Resource for Learning in Knowledge Distillation
This work tackles performance gaps, adversarial robustness, and label noise in compact models, offering incremental improvements in collaborative learning frameworks.
The study investigated the effect of adding noise in knowledge distillation to address challenges in deep neural networks, proposing three methods that improve training effectiveness and distill desirable characteristics in student models.
While noise is commonly considered a nuisance in computing systems, a number of studies in neuroscience have shown several benefits of noise in the nervous system from enabling the brain to carry out computations such as probabilistic inference as well as carrying additional information about the stimuli. Similarly, noise has been shown to improve the performance of deep neural networks. In this study, we further investigate the effect of adding noise in the knowledge distillation framework because of its resemblance to collaborative subnetworks in the brain regions. We empirically show that injecting constructive noise at different levels in the collaborative learning framework enables us to train the model effectively and distill desirable characteristics in the student model. In doing so, we propose three different methods that target the common challenges in deep neural networks: minimizing the performance gap between a compact model and large model (Fickle Teacher), training high performance compact adversarially robust models (Soft Randomization), and training models efficiently under label noise (Messy Collaboration). Our findings motivate further study in the role of noise as a resource for learning in a collaborative learning framework.