CVMay 24, 2024

Composed Image Retrieval for Remote Sensing

arXiv:2405.15587v318 citationsh-index: 43Has CodeIGARSS
Originality Incremental advance
AI Analysis

This addresses a gap in remote sensing image retrieval by enhancing query flexibility for users in fields like environmental monitoring or urban planning, though it is incremental as it adapts an existing paradigm to a new domain.

The paper introduces composed image retrieval to remote sensing, enabling queries using image examples combined with textual descriptions to modify attributes like shape or color, and demonstrates that a vision-language model achieves state-of-the-art results without additional training.

This work introduces composed image retrieval to remote sensing. It allows to query a large image archive by image examples alternated by a textual description, enriching the descriptive power over unimodal queries, either visual or textual. Various attributes can be modified by the textual part, such as shape, color, or context. A novel method fusing image-to-image and text-to-image similarity is introduced. We demonstrate that a vision-language model possesses sufficient descriptive power and no further learning step or training data are necessary. We present a new evaluation benchmark focused on color, context, density, existence, quantity, and shape modifications. Our work not only sets the state-of-the-art for this task, but also serves as a foundational step in addressing a gap in the field of remote sensing image retrieval. Code at: https://github.com/billpsomas/rscir

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes