MLAILGMay 12, 2018

Born Again Neural Networks

arXiv:1805.04770v21221 citations
Originality Highly original
AI Analysis

This work addresses the challenge of enhancing model accuracy without architectural changes, offering a novel approach for researchers and practitioners in computer vision and language modeling.

The paper tackles the problem of improving model performance through knowledge distillation by training students with identical architecture to their teachers, achieving state-of-the-art validation errors of 3.5% on CIFAR-10 and 15.5% on CIFAR-100.

Knowledge Distillation (KD) consists of transferring “knowledge” from one machine learning model (the teacher) to another (the student). Commonly, the teacher is a high-capacity model with formidable performance, while the student is more compact. By transferring knowledge, one hopes to benefit from the student’s compactness, without sacrificing too much performance. We study KD from a new perspective: rather than compressing models, we train students parameterized identically to their teachers. Surprisingly, these Born-Again Networks (BANs), outperform their teachers significantly, both on computer vision and language modeling tasks. Our experiments with BANs based on DenseNets demonstrate state-of-the-art performance on the CIFAR-10 (3.5%) and CIFAR-100 (15.5%) datasets, by validation error. Additional experiments explore two distillation objectives: (i) Confidence-Weighted by Teacher Max (CWTM) and (ii) Dark Knowledge with Permuted Predictions (DKPP). Both methods elucidate the essential components of KD, demonstrating the effect of the teacher outputs on both predicted and non-predicted classes.

Code Implementations2 repos
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes