A comparison of dense region detectors for image search and fine-grained classification
This work addresses the need for better patch extraction methods in computer vision for tasks like image retrieval and fine-grained classification, though it is incremental as it builds on existing coding approaches.
The paper tackled the problem of improving patch extraction for image classification and search pipelines by proposing and evaluating alternative dense region detectors, such as super-pixels, edges, and Zernike filters, and found that these methods outperform regular dense detectors in most cases, leading to state-of-the-art improvements on standard benchmarks.
We consider a pipeline for image classification or search based on coding approaches like Bag of Words or Fisher vectors. In this context, the most common approach is to extract the image patches regularly in a dense manner on several scales. This paper proposes and evaluates alternative choices to extract patches densely. Beyond simple strategies derived from regular interest region detectors, we propose approaches based on super-pixels, edges, and a bank of Zernike filters used as detectors. The different approaches are evaluated on recent image retrieval and fine-grain classification benchmarks. Our results show that the regular dense detector is outperformed by other methods in most situations, leading us to improve the state of the art in comparable setups on standard retrieval and fined-grain benchmarks. As a byproduct of our study, we show that existing methods for blob and super-pixel extraction achieve high accuracy if the patches are extracted along the edges and not around the detected regions.