CVMar 29, 2016

Learning a Predictable and Generative Vector Representation for Objects

Rohit Girdhar, David F. Fouhey, Mikel Rodriguez, Abhinav Gupta

arXiv:1603.08637v235.9744 citations

Originality Incremental advance

AI Analysis

This addresses the need for versatile object representations in computer vision, though it appears incremental as it combines existing autoencoder and convolutional network components.

The paper tackles the problem of learning a vector representation for objects that is both generative in 3D and predictable from 2D images, proposing a TL-embedding network that achieves this and enables tasks like voxel prediction and 3D model retrieval.

What is a good vector representation of an object? We believe that it should be generative in 3D, in the sense that it can produce new 3D objects; as well as be predictable from 2D, in the sense that it can be perceived from 2D images. We propose a novel architecture, called the TL-embedding network, to learn an embedding space with these properties. The network consists of two components: (a) an autoencoder that ensures the representation is generative; and (b) a convolutional network that ensures the representation is predictable. This enables tackling a number of tasks including voxel prediction from 2D images and 3D model retrieval. Extensive experimental analysis demonstrates the usefulness and versatility of this embedding.

View on arXiv PDF

Similar