CVNov 26, 2014

3D-Assisted Image Feature Synthesis for Novel Views of an Object

arXiv:1412.0003v129 citations
Originality Incremental advance
AI Analysis

This addresses the challenge of comparing images under large viewpoint changes for computer vision applications, though it appears incremental as it builds on existing 3D-assisted methods.

The paper tackles the problem of view-invariant image comparison by synthesizing features for novel views of an object from a single input image, using aligned 3D models to predict and transfer features, and shows that this approach significantly outperforms traditional features in fine-grained retrieval and classification tasks.

Comparing two images in a view-invariant way has been a challenging problem in computer vision for a long time, as visual features are not stable under large view point changes. In this paper, given a single input image of an object, we synthesize new features for other views of the same object. To accomplish this, we introduce an aligned set of 3D models in the same class as the input object image. Each 3D model is represented by a set of views, and we study the correlation of image patches between different views, seeking what we call surrogates --- patches in one view whose feature content predicts well the features of a patch in another view. In particular, for each patch in the novel desired view, we seek surrogates from the observed view of the given image. For a given surrogate, we predict that surrogate using linear combination of the corresponding patches of the 3D model views, learn the coefficients, and then transfer these coefficients on a per patch basis to synthesize the features of the patch in the novel view. In this way we can create feature sets for all views of the latent object, providing us a multi-view representation of the object. View-invariant object comparisons are achieved simply by computing the $L^2$ distances between the features of corresponding views. We provide theoretical and empirical analysis of the feature synthesis process, and evaluate the proposed view-agnostic distance (VAD) in fine-grained image retrieval (100 object classes) and classification tasks. Experimental results show that our synthesized features do enable view-independent comparison between images and perform significantly better than traditional image features in this respect.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes