CVLGNEDec 22, 2014

Convolutional Neural Networks for joint object detection and pose estimation: A comparative study

arXiv:1412.7190v418 citations
Originality Incremental advance
AI Analysis

This addresses the problem of 3D scene understanding for computer vision applications, but is incremental as it builds on existing CNNs and benchmarks.

The paper tackles joint object detection and 3D pose estimation in still images by comparing different feature representations and energies in convolutional neural networks, showing that a classification approach on discretized viewpoints achieves state-of-the-art performance on the Pascal3D+ benchmark with significant improvements over existing baselines.

In this paper we study the application of convolutional neural networks for jointly detecting objects depicted in still images and estimating their 3D pose. We identify different feature representations of oriented objects, and energies that lead a network to learn this representations. The choice of the representation is crucial since the pose of an object has a natural, continuous structure while its category is a discrete variable. We evaluate the different approaches on the joint object detection and pose estimation task of the Pascal3D+ benchmark using Average Viewpoint Precision. We show that a classification approach on discretized viewpoints achieves state-of-the-art performance for joint object detection and pose estimation, and significantly outperforms existing baselines on this benchmark.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes