LGCVMLOct 30, 2024

Why Fine-grained Labels in Pretraining Benefit Generalization?

arXiv:2410.23129v24 citationsh-index: 5Trans. Mach. Learn. Res.
Originality Incremental advance
AI Analysis

This provides a theoretical justification for a common practice in deep learning, addressing an open problem for researchers and practitioners.

The paper tackles the problem of why pretraining with fine-grained labels improves generalization over coarse-grained pretraining, and proves that fine-grained pretraining enables learning both common and rare features, leading to better accuracy on hard test samples.

Recent studies show that pretraining a deep neural network with fine-grained labeled data, followed by fine-tuning on coarse-labeled data for downstream tasks, often yields better generalization than pretraining with coarse-labeled data. While there is ample empirical evidence supporting this, the theoretical justification remains an open problem. This paper addresses this gap by introducing a "hierarchical multi-view" structure to confine the input data distribution. Under this framework, we prove that: 1) coarse-grained pretraining only allows a neural network to learn the common features well, while 2) fine-grained pretraining helps the network learn the rare features in addition to the common ones, leading to improved accuracy on hard downstream test samples.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes