Patch-wise Retrieval: A Bag of Practical Techniques for Instance-level Matching
This work addresses the problem of accurately and efficiently matching objects in images for applications like visual search, though it appears incremental as it builds on existing retrieval techniques with practical optimizations.
The paper tackles instance-level image retrieval by proposing Patchify, a patch-wise framework that improves accuracy and interpretability without fine-tuning, achieving superior performance over global methods and enhancing state-of-the-art reranking pipelines in experiments across multiple benchmarks.
Instance-level image retrieval aims to find images containing the same object as a given query, despite variations in size, position, or appearance. To address this challenging task, we propose Patchify, a simple yet effective patch-wise retrieval framework that offers high performance, scalability, and interpretability without requiring fine-tuning. Patchify divides each database image into a small number of structured patches and performs retrieval by comparing these local features with a global query descriptor, enabling accurate and spatially grounded matching. To assess not just retrieval accuracy but also spatial correctness, we introduce LocScore, a localization-aware metric that quantifies whether the retrieved region aligns with the target object. This makes LocScore a valuable diagnostic tool for understanding and improving retrieval behavior. We conduct extensive experiments across multiple benchmarks, backbones, and region selection strategies, showing that Patchify outperforms global methods and complements state-of-the-art reranking pipelines. Furthermore, we apply Product Quantization for efficient large-scale retrieval and highlight the importance of using informative features during compression, which significantly boosts performance. Project website: https://wons20k.github.io/PatchwiseRetrieval/