CVLGJun 7, 2020

CubifAE-3D: Monocular Camera Space Cubification for Auto-Encoder based 3D Object Detection

arXiv:2006.04080v23 citations
Originality Incremental advance
AI Analysis

This addresses 3D object detection for autonomous vehicles using only monocular images, but it is incremental as it builds on existing auto-encoder and cubification techniques.

The paper tackles monocular 3D object detection by pre-training an auto-encoder on synthetic RGB-depth data and using its latent embedding to train a 3D detector with a cubified camera space, achieving results on Virtual KITTI 2 and KITTI datasets.

We introduce a method for 3D object detection using a single monocular image. Starting from a synthetic dataset, we pre-train an RGB-to-Depth Auto-Encoder (AE). The embedding learnt from this AE is then used to train a 3D Object Detector (3DOD) CNN which is used to regress the parameters of 3D object poses after the encoder from the AE generates a latent embedding from the RGB image. We show that we can pre-train the AE using paired RGB and depth images from simulation data once and subsequently only train the 3DOD network using real data, comprising of RGB images and 3D object pose labels (without the requirement of dense depth). Our 3DOD network utilizes a particular `cubification' of 3D space around the camera, where each cuboid is tasked with predicting N object poses, along with their class and confidence values. The AE pre-training and this method of dividing the 3D space around the camera into cuboids give our method its name - CubifAE-3D. We demonstrate results for monocular 3D object detection in the Autonomous Vehicle (AV) use-case with the Virtual KITTI 2 and the KITTI datasets.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes