CVNov 1, 2024

Retrieval-enriched zero-shot image classification in low-resource domains

arXiv:2411.00988v123 citationsh-index: 15EMNLP
Originality Incremental advance
AI Analysis

It addresses the problem of classifying images with scarce data and annotations for domains like medicine and biology, representing an incremental improvement over existing methods.

The paper tackles zero-shot image classification in low-resource domains by using a retrieval-based strategy to enrich image and class representations with textual information from web databases, achieving state-of-the-art performance on a new benchmark covering medical imaging, rare plants, and circuits.

Low-resource domains, characterized by scarce data and annotations, present significant challenges for language and visual understanding tasks, with the latter much under-explored in the literature. Recent advancements in Vision-Language Models (VLM) have shown promising results in high-resource domains but fall short in low-resource concepts that are under-represented (e.g. only a handful of images per category) in the pre-training set. We tackle the challenging task of zero-shot low-resource image classification from a novel perspective. By leveraging a retrieval-based strategy, we achieve this in a training-free fashion. Specifically, our method, named CoRE (Combination of Retrieval Enrichment), enriches the representation of both query images and class prototypes by retrieving relevant textual information from large web-crawled databases. This retrieval-based enrichment significantly boosts classification performance by incorporating the broader contextual information relevant to the specific class. We validate our method on a newly established benchmark covering diverse low-resource domains, including medical imaging, rare plants, and circuits. Our experiments demonstrate that CORE outperforms existing state-of-the-art methods that rely on synthetic data generation and model fine-tuning.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes