CV LG NEDec 22, 2014

Convolutional Neural Networks for joint object detection and pose estimation: A comparative study

Francisco Massa, Mathieu Aubry, Renaud Marlet

arXiv:1412.7190v418 citations

Originality Incremental advance

AI Analysis

This addresses the problem of 3D scene understanding for computer vision applications, but is incremental as it builds on existing CNNs and benchmarks.

The paper tackles joint object detection and 3D pose estimation in still images by comparing different feature representations and energies in convolutional neural networks, showing that a classification approach on discretized viewpoints achieves state-of-the-art performance on the Pascal3D+ benchmark with significant improvements over existing baselines.

In this paper we study the application of convolutional neural networks for jointly detecting objects depicted in still images and estimating their 3D pose. We identify different feature representations of oriented objects, and energies that lead a network to learn this representations. The choice of the representation is crucial since the pose of an object has a natural, continuous structure while its category is a discrete variable. We evaluate the different approaches on the joint object detection and pose estimation task of the Pascal3D+ benchmark using Average Viewpoint Precision. We show that a classification approach on discretized viewpoints achieves state-of-the-art performance for joint object detection and pose estimation, and significantly outperforms existing baselines on this benchmark.

View on arXiv PDF

Similar