LG CV MLOct 30, 2024

Why Fine-grained Labels in Pretraining Benefit Generalization?

Guan Zhe Hong, Yin Cui, Ariel Fuxman, Stanley Chan, Enming Luo

arXiv:2410.23129v27.94 citationsh-index: 5Trans. Mach. Learn. Res.

Originality Incremental advance

AI Analysis

This provides a theoretical justification for a common practice in deep learning, addressing an open problem for researchers and practitioners.

The paper tackles the problem of why pretraining with fine-grained labels improves generalization over coarse-grained pretraining, and proves that fine-grained pretraining enables learning both common and rare features, leading to better accuracy on hard test samples.

Recent studies show that pretraining a deep neural network with fine-grained labeled data, followed by fine-tuning on coarse-labeled data for downstream tasks, often yields better generalization than pretraining with coarse-labeled data. While there is ample empirical evidence supporting this, the theoretical justification remains an open problem. This paper addresses this gap by introducing a "hierarchical multi-view" structure to confine the input data distribution. Under this framework, we prove that: 1) coarse-grained pretraining only allows a neural network to learn the common features well, while 2) fine-grained pretraining helps the network learn the rare features in addition to the common ones, leading to improved accuracy on hard downstream test samples.

View on arXiv PDF

Similar