CVJun 1, 2017

Deep Mutual Learning

arXiv:1706.00384v11963 citations
Originality Highly original
AI Analysis

This approach addresses the need for efficient knowledge transfer in machine learning, particularly for low-memory or fast execution applications, by eliminating the requirement for a pre-defined powerful teacher network.

The paper tackles the problem of knowledge transfer in model distillation by proposing a deep mutual learning strategy where an ensemble of student networks collaboratively teach each other during training, achieving compelling results on CIFAR-100 and Market-1501 benchmarks and outperforming traditional distillation from a static teacher.

Model distillation is an effective and widely used technique to transfer knowledge from a teacher to a student network. The typical application is to transfer from a powerful large network or ensemble to a small network, that is better suited to low-memory or fast execution requirements. In this paper, we present a deep mutual learning (DML) strategy where, rather than one way transfer between a static pre-defined teacher and a student, an ensemble of students learn collaboratively and teach each other throughout the training process. Our experiments show that a variety of network architectures benefit from mutual learning and achieve compelling results on CIFAR-100 recognition and Market-1501 person re-identification benchmarks. Surprisingly, it is revealed that no prior powerful teacher network is necessary -- mutual learning of a collection of simple student networks works, and moreover outperforms distillation from a more powerful yet static teacher.

Code Implementations8 repos

Data from Papers with Code (CC-BY-SA-4.0)

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes