A Dense-Depth Representation for VLAD descriptors in Content-Based Image Retrieval
This work addresses content-based image retrieval for computer vision applications, but it is incremental as it builds on existing CNN and VLAD methods.
The paper tackles the problem of improving image retrieval performance by introducing a new detector applied to CNN feature maps to increase feature density for VLAD aggregation, achieving improved results on public datasets like Holidays, Oxford5k, Paris6k, and UKB.
The recent advances brought by deep learning allowed to improve the performance in image retrieval tasks. Through the many convolutional layers, available in a Convolutional Neural Network (CNN), it is possible to obtain a hierarchy of features from the evaluated image. At every step, the patches extracted are smaller than the previous levels and more representative. Following this idea, this paper introduces a new detector applied on the feature maps extracted from pre-trained CNN. Specifically, this approach lets to increase the number of features in order to increase the performance of the aggregation algorithms like the most famous and used VLAD embedding. The proposed approach is tested on different public datasets: Holidays, Oxford5k, Paris6k and UKB.