CVCLNov 5, 2021

The Curious Layperson: Fine-Grained Image Recognition without Expert Labels

arXiv:2111.03651v115 citationsHas Code
Originality Incremental advance
AI Analysis

This addresses the problem of reducing reliance on expert annotations for fine-grained recognition, which is incremental as it builds on existing cross-modal techniques.

The paper tackles fine-grained image recognition without expert labels by using web encyclopedia knowledge and non-expert image descriptions, achieving competitive results on two datasets compared to state-of-the-art cross-modal retrieval methods.

Most of us are not experts in specific fields, such as ornithology. Nonetheless, we do have general image and language understanding capabilities that we use to match what we see to expert resources. This allows us to expand our knowledge and perform novel tasks without ad-hoc external supervision. On the contrary, machines have a much harder time consulting expert-curated knowledge bases unless trained specifically with that knowledge in mind. Thus, in this paper we consider a new problem: fine-grained image recognition without expert annotations, which we address by leveraging the vast knowledge available in web encyclopedias. First, we learn a model to describe the visual appearance of objects using non-expert image descriptions. We then train a fine-grained textual similarity model that matches image descriptions with documents on a sentence-level basis. We evaluate the method on two datasets and compare with several strong baselines and the state of the art in cross-modal retrieval. Code is available at: https://github.com/subhc/clever

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes