CVMar 22, 2017

Deep MANTA: A Coarse-to-fine Many-Task Network for joint 2D and 3D vehicle analysis from monocular image

arXiv:1703.07570v1461 citations
Originality Highly original
AI Analysis

This work addresses the problem of comprehensive vehicle analysis for autonomous driving systems, representing a novel integration of multiple tasks rather than an incremental improvement.

The paper tackles joint 2D and 3D vehicle analysis from monocular images by introducing Deep MANTA, a coarse-to-fine many-task network that simultaneously handles detection, part localization, visibility characterization, and 3D dimension estimation, and it outperforms state-of-the-art methods on the KITTI benchmark.

In this paper, we present a novel approach, called Deep MANTA (Deep Many-Tasks), for many-task vehicle analysis from a given image. A robust convolutional network is introduced for simultaneous vehicle detection, part localization, visibility characterization and 3D dimension estimation. Its architecture is based on a new coarse-to-fine object proposal that boosts the vehicle detection. Moreover, the Deep MANTA network is able to localize vehicle parts even if these parts are not visible. In the inference, the network's outputs are used by a real time robust pose estimation algorithm for fine orientation estimation and 3D vehicle localization. We show in experiments that our method outperforms monocular state-of-the-art approaches on vehicle detection, orientation and 3D location tasks on the very challenging KITTI benchmark.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes