CVAug 9, 2024

On the Element-Wise Representation and Reasoning in Zero-Shot Image Recognition: A Systematic Survey

arXiv:2408.04879v37 citationsh-index: 18
AI Analysis

It addresses the lack of a comprehensive review for researchers in computer vision, but it is incremental as it synthesizes existing work rather than introducing new methods.

This paper provides a systematic survey of zero-shot image recognition from an element-wise perspective, integrating tasks like object recognition and compositional recognition into a unified paradigm, and it summarizes benchmarks and applications to guide future research.

Zero-shot image recognition (ZSIR) aims to recognize and reason in unseen domains by learning generalized knowledge from limited data in the seen domain. The gist of ZSIR is constructing a well-aligned mapping between the input visual space and the target semantic space, which is a bottom-up paradigm inspired by the process by which humans observe the world. In recent years, ZSIR has witnessed significant progress on a broad spectrum, from theory to algorithm design, as well as widespread applications. However, to the best of our knowledge, there remains a lack of a systematic review of ZSIR from an element-wise perspective, i.e., learning fine-grained elements of data and their inferential associations. To fill the gap, this paper thoroughly investigates recent advances in element-wise ZSIR and provides a sound basis for its future development. Concretely, we first integrate three basic ZSIR tasks, i.e., object recognition, compositional recognition, and foundation model-based open-world recognition, into a unified element-wise paradigm and provide a detailed taxonomy and analysis of the main approaches. Next, we summarize the benchmarks, covering technical implementations, standardized datasets, and some more details as a library. Last, we sketch out related applications, discuss vital challenges, and suggest potential future directions.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes