CVLGJul 14, 2023

DreamTeacher: Pretraining Image Backbones with Deep Generative Models

NVIDIAU of Toronto
arXiv:2307.07487v138 citationsh-index: 140
Originality Highly original
AI Analysis

This work addresses the need for effective pre-training of image backbones without manual annotation, offering a promising alternative to labeled datasets like ImageNet.

The paper tackles the problem of self-supervised feature representation learning by introducing DreamTeacher, a framework that distills knowledge from generative models into image backbones, and it results in significant improvements over existing self-supervised approaches and ImageNet classification pre-training on downstream datasets.

In this work, we introduce a self-supervised feature representation learning framework DreamTeacher that utilizes generative networks for pre-training downstream image backbones. We propose to distill knowledge from a trained generative model into standard image backbones that have been well engineered for specific perception tasks. We investigate two types of knowledge distillation: 1) distilling learned generative features onto target image backbones as an alternative to pretraining these backbones on large labeled datasets such as ImageNet, and 2) distilling labels obtained from generative networks with task heads onto logits of target backbones. We perform extensive analyses on multiple generative models, dense prediction benchmarks, and several pre-training regimes. We empirically find that our DreamTeacher significantly outperforms existing self-supervised representation learning approaches across the board. Unsupervised ImageNet pre-training with DreamTeacher leads to significant improvements over ImageNet classification pre-training on downstream datasets, showcasing generative models, and diffusion generative models specifically, as a promising approach to representation learning on large, diverse datasets without requiring manual annotation.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes