CVMay 28, 2023

Bayesian Decision Making to Localize Visual Queries in 2D

arXiv:2305.17611v1Has Code
Originality Synthesis-oriented
AI Analysis

This work addresses the challenge of improving localization accuracy in egocentric video datasets like EGO4D, representing an incremental advancement over existing methods.

The paper tackles the problem of false positives in visual query localization by using a Bayesian decision-making approach that combines transformer-based high-dimensional similarity priors with Siamese network low-dimensional measurements to compute posterior similarities, achieving a reduction in false positives.

This report describes our approach for the EGO4D 2023 Visual Query 2D Localization Challenge. Our method aims to reduce the number of False Positives (FP) that occur because of high similarity between the visual crop and the proposed bounding boxes from the baseline's Region Proposal Network (RPN). Our method uses a transformer to determine similarity in higher dimensions which is used as our prior belief. The results are then combined together with the similarity in lower dimensions from the Siamese Head, acting as our measurement, to generate a posterior which is then used to determine the final similarity of the visual crop with the proposed bounding box. Our code is publicly available $\href{https://github.com/s-m-asjad/EGO4D_VQ2D}{here}$.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes