CVNov 25, 2020

Grafit: Learning fine-grained image representations with coarse labels

Hugo Touvron, Alexandre Sablayrolles, Matthijs Douze, Matthieu Cord, Hervé Jégou

arXiv:2011.12982v118.279 citations

Originality Highly original

AI Analysis

This work provides a significant improvement for researchers and practitioners in computer vision who need to perform fine-grained image retrieval or classification but only have access to coarsely labeled datasets.

This paper addresses the challenge of learning fine-grained image representations using only coarse training labels, enabling fine-grained category retrieval. The proposed method, Grafit, uses a nearest-neighbor classifier objective and an instance loss, achieving state-of-the-art results on five public benchmarks, including iNaturalist-2018.

This paper tackles the problem of learning a finer representation than the one provided by training labels. This enables fine-grained category retrieval of images in a collection annotated with coarse labels only. Our network is learned with a nearest-neighbor classifier objective, and an instance loss inspired by self-supervised learning. By jointly leveraging the coarse labels and the underlying fine-grained latent space, it significantly improves the accuracy of category-level retrieval methods. Our strategy outperforms all competing methods for retrieving or classifying images at a finer granularity than that available at train time. It also improves the accuracy for transfer learning tasks to fine-grained datasets, thereby establishing the new state of the art on five public benchmarks, like iNaturalist-2018.

View on arXiv PDF

Similar