CVLGMar 29, 2023

Towards Understanding the Effect of Pretraining Label Granularity

arXiv:2303.16887v25 citationsh-index: 33
Originality Incremental advance
AI Analysis

This work addresses the problem of optimizing pretraining strategies for transfer learning in image classification, providing insights that could benefit researchers and practitioners, though it is incremental as it builds on existing practices with theoretical and empirical validation.

The paper investigates how pretraining label granularity affects deep neural network generalization in image classification, finding that fine-grained pretraining on leaf labels improves transfer accuracy on ImageNet1k compared to coarser levels, with theoretical analysis explaining this by showing fine-grained pretraining helps learn rarer features for better performance on hard samples.

In this paper, we study how the granularity of pretraining labels affects the generalization of deep neural networks in image classification tasks. We focus on the "fine-to-coarse" transfer learning setting, where the pretraining label space is more fine-grained than that of the target problem. Empirically, we show that pretraining on the leaf labels of ImageNet21k produces better transfer results on ImageNet1k than pretraining on other coarser granularity levels, which supports the common practice used in the community. Theoretically, we explain the benefit of fine-grained pretraining by proving that, for a data distribution satisfying certain hierarchy conditions, 1) coarse-grained pretraining only allows a neural network to learn the "common" or "easy-to-learn" features well, while 2) fine-grained pretraining helps the network learn the "rarer" or "fine-grained" features in addition to the common ones, thus improving its accuracy on hard downstream test samples in which common features are missing or weak in strength. Furthermore, we perform comprehensive experiments using the label hierarchies of iNaturalist 2021 and observe that the following conditions, in addition to proper choice of label granularity, enable the transfer to work well in practice: 1) the pretraining dataset needs to have a meaningful label hierarchy, and 2) the pretraining and target label functions need to align well.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes