Asymmetric Image Retrieval with Cross Model Compatible Ensembles
This work addresses the challenge of efficient and accurate image retrieval for resource-constrained applications like face recognition, offering a novel method that is not incremental but provides a new paradigm for handling model ensembles in asymmetric settings.
The paper tackles the problem of asymmetric image retrieval by proposing an approach that uses embedding transformation models instead of knowledge distillation, enabling the use of multiple diverse gallery models and a single lightweight query model to improve accuracy beyond any single model while maintaining low computational costs, achieving state-of-the-art results on benchmarks like CUB-200-2011 and Cars196 with up to 5% improvement in recall@1.
The asymmetrical retrieval setting is a well suited solution for resource constrained applications such as face recognition and image retrieval. In this setting, a large model is used for indexing the gallery while a lightweight model is used for querying. The key principle in such systems is ensuring that both models share the same embedding space. Most methods in this domain are based on knowledge distillation. While useful, they suffer from several drawbacks: they are upper-bounded by the performance of the single best model found and cannot be extended to use an ensemble of models in a straightforward manner. In this paper we present an approach that does not rely on knowledge distillation, rather it utilizes embedding transformation models. This allows the use of N independently trained and diverse gallery models (e.g., trained on different datasets or having a different architecture) and a single query model. As a result, we improve the overall accuracy beyond that of any single model while maintaining a low computational budget for querying. Additionally, we propose a gallery image rejection method that utilizes the diversity between multiple transformed embeddings to estimate the uncertainty of gallery images.