CVIRLGSep 2, 2024

Evidential Transformers for Improved Image Retrieval

arXiv:2409.01082v21 citationsh-index: 20
AI Analysis

This work addresses the problem of robust and reliable image retrieval for applications in computer vision, representing an incremental improvement over existing methods.

The paper tackles content-based image retrieval by introducing an uncertainty-driven transformer model that incorporates probabilistic methods, achieving new state-of-the-art results on datasets like Stanford Online Products and CUB-200-2011.

We introduce the Evidential Transformer, an uncertainty-driven transformer model for improved and robust image retrieval. In this paper, we make several contributions to content-based image retrieval (CBIR). We incorporate probabilistic methods into image retrieval, achieving robust and reliable results, with evidential classification surpassing traditional training based on multiclass classification as a baseline for deep metric learning. Furthermore, we improve the state-of-the-art retrieval results on several datasets by leveraging the Global Context Vision Transformer (GC ViT) architecture. Our experimental results consistently demonstrate the reliability of our approach, setting a new benchmark in CBIR in all test settings on the Stanford Online Products (SOP) and CUB-200-2011 datasets.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes