CVMar 8, 2018

Learning Effective Binary Visual Representations with Deep Networks

arXiv:1803.03004v11 citations
Originality Highly original
AI Analysis

This work addresses the need for efficient and generalizable binary representations in computer vision, offering a novel approach that improves performance across multiple tasks.

The paper tackles the problem of generating binary visual representations for tasks beyond retrieval, such as classification and detection, by proposing the Approximately Binary Clamping (ABC) method, which achieves comparable accuracy to real-valued methods on ImageNet classification and better generalization in object detection.

Although traditionally binary visual representations are mainly designed to reduce computational and storage costs in the image retrieval research, this paper argues that binary visual representations can be applied to large scale recognition and detection problems in addition to hashing in retrieval. Furthermore, the binary nature may make it generalize better than its real-valued counterparts. Existing binary hashing methods are either two-stage or hinging on loss term regularization or saturated functions, hence converge slowly and only emit soft binary values. This paper proposes Approximately Binary Clamping (ABC), which is non-saturating, end-to-end trainable, with fast convergence and can output true binary visual representations. ABC achieves comparable accuracy in ImageNet classification as its real-valued counterpart, and even generalizes better in object detection. On benchmark image retrieval datasets, ABC also outperforms existing hashing methods.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes