CVApr 19, 2015

Visual Recognition Using Directional Distribution Distance

arXiv:1504.04792v22 citations
AI Analysis

This work addresses the need for efficient set comparison in computer vision, offering a novel discriminative approach that outperforms existing generative methods like FV and VLAD, with incremental improvements through synergy with FV.

The paper tackled the problem of comparing sets of instance vectors in visual recognition by proposing a discriminative method called D3, which uses directional total variation distance to measure separation between distributions, achieving excellent accuracy and speed in action and image recognition tasks.

In computer vision, an entity such as an image or video is often represented as a set of instance vectors, which can be SIFT, motion, or deep learning feature vectors extracted from different parts of that entity. Thus, it is essential to design efficient and effective methods to compare two sets of instance vectors. Existing methods such as FV, VLAD or Super Vectors have achieved excellent results. However, this paper shows that these methods are designed based on a generative perspective, and a discriminative method can be more effective in categorizing images or videos. The proposed D3 (discriminative distribution distance) method effectively compares two sets as two distributions, and proposes a directional total variation distance (DTVD) to measure how separated are they. Furthermore, a robust classifier-based method is proposed to estimate DTVD robustly. The D3 method is evaluated in action and image recognition tasks and has achieved excellent accuracy and speed. D3 also has a synergy with FV. The combination of D3 and FV has advantages over D3, FV, and VLAD.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes