LOCORE: Image Re-ranking with Long-Context Sequence Modeling
This addresses the problem of improving accuracy in image retrieval for applications like landmark recognition and product search, though it is incremental as it builds on existing re-ranking techniques.
The paper tackles image retrieval by introducing LOCORE, a model that performs list-wise re-ranking using local descriptors, achieving superior performance on benchmarks like ROxf, RPar, SOP, In-Shop, and CUB-200 with comparable latency to existing methods.
We introduce LOCORE, Long-Context Re-ranker, a model that takes as input local descriptors corresponding to an image query and a list of gallery images and outputs similarity scores between the query and each gallery image. This model is used for image retrieval, where typically a first ranking is performed with an efficient similarity measure, and then a shortlist of top-ranked images is re-ranked based on a more fine-grained similarity measure. Compared to existing methods that perform pair-wise similarity estimation with local descriptors or list-wise re-ranking with global descriptors, LOCORE is the first method to perform list-wise re-ranking with local descriptors. To achieve this, we leverage efficient long-context sequence models to effectively capture the dependencies between query and gallery images at the local-descriptor level. During testing, we process long shortlists with a sliding window strategy that is tailored to overcome the context size limitations of sequence models. Our approach achieves superior performance compared with other re-rankers on established image retrieval benchmarks of landmarks (ROxf and RPar), products (SOP), fashion items (In-Shop), and bird species (CUB-200) while having comparable latency to the pair-wise local descriptor re-rankers.