Evidential Transformers for Improved Image Retrieval
This work addresses the problem of robust and reliable image retrieval for applications in computer vision, representing an incremental improvement over existing methods.
The paper tackles content-based image retrieval by introducing an uncertainty-driven transformer model that incorporates probabilistic methods, achieving new state-of-the-art results on datasets like Stanford Online Products and CUB-200-2011.
We introduce the Evidential Transformer, an uncertainty-driven transformer model for improved and robust image retrieval. In this paper, we make several contributions to content-based image retrieval (CBIR). We incorporate probabilistic methods into image retrieval, achieving robust and reliable results, with evidential classification surpassing traditional training based on multiclass classification as a baseline for deep metric learning. Furthermore, we improve the state-of-the-art retrieval results on several datasets by leveraging the Global Context Vision Transformer (GC ViT) architecture. Our experimental results consistently demonstrate the reliability of our approach, setting a new benchmark in CBIR in all test settings on the Stanford Online Products (SOP) and CUB-200-2011 datasets.