Concept Retrieval -- What and How?
This addresses the challenge of concept-based image retrieval for applications in narrative understanding, though it appears incremental as it builds on existing embedding methods.
The paper tackles the problem of retrieving images that share central concepts beyond visual or semantic similarity, introducing a novel approach based on bimodal Gaussian distributions in embedding neighborhoods, with effectiveness confirmed through qualitative, quantitative, and human evaluations.
A concept may reflect either a concrete or abstract idea. Given an input image, this paper seeks to retrieve other images that share its central concepts, capturing aspects of the underlying narrative. This goes beyond conventional retrieval or clustering methods, which emphasize visual or semantic similarity. We formally define the problem, outline key requirements, and introduce appropriate evaluation metrics. We propose a novel approach grounded in two key observations: (1) While each neighbor in the embedding space typically shares at least one concept with the query, not all neighbors necessarily share the same concept with one another. (2) Modeling this neighborhood with a bimodal Gaussian distribution uncovers meaningful structure that facilitates concept identification. Qualitative, quantitative, and human evaluations confirm the effectiveness of our approach. See the package on PyPI: https://pypi.org/project/coret/