Deep Cuboid Detection: Beyond 2D Bounding Boxes
This work addresses the need for real-time 3D object detection in applications like augmented reality and robotics, though it is incremental as it builds on existing deep learning approaches.
The paper tackles the problem of detecting 3D cuboids in cluttered RGB images, proposing an end-to-end deep learning system that localizes cuboids with 2D bounding boxes and 3D corner keypoints, achieving significant improvements over baseline methods.
We present a Deep Cuboid Detector which takes a consumer-quality RGB image of a cluttered scene and localizes all 3D cuboids (box-like objects). Contrary to classical approaches which fit a 3D model from low-level cues like corners, edges, and vanishing points, we propose an end-to-end deep learning system to detect cuboids across many semantic categories (e.g., ovens, shipping boxes, and furniture). We localize cuboids with a 2D bounding box, and simultaneously localize the cuboid's corners, effectively producing a 3D interpretation of box-like objects. We refine keypoints by pooling convolutional features iteratively, improving the baseline method significantly. Our deep learning cuboid detector is trained in an end-to-end fashion and is suitable for real-time applications in augmented reality (AR) and robotics.