3DRot: 3D Rotation Augmentation for RGB-Based 3D Tasks
This addresses the challenge of scarce annotations and augmentation for researchers and practitioners in 3D computer vision, offering a plug-and-play solution that is incremental but effective.
The paper tackles the problem of limited data augmentation for RGB-based 3D tasks by introducing 3DRot, a geometry-consistent rotation and mirroring method that preserves projective geometry without depth, resulting in improved performance on monocular 3D detection with metrics like IoU3D increasing from 43.21 to 44.51 and mAP0.5 from 35.70 to 38.11 on the SUN RGB-D dataset.
RGB-based 3D tasks, e.g., 3D detection, depth estimation, 3D keypoint estimation, still suffer from scarce, expensive annotations and a thin augmentation toolbox, since most image transforms, including resize and rotation, disrupt geometric consistency. In this paper, we introduce 3DRot, a plug-and-play augmentation that rotates and mirrors images about the camera's optical center while synchronously updating RGB images, camera intrinsics, object poses, and 3D annotations to preserve projective geometry-achieving geometry-consistent rotations and reflections without relying on any scene depth. We validate 3DRot with a classical 3D task, monocular 3D detection. On SUN RGB-D dataset, 3DRot raises $IoU_{3D}$ from 43.21 to 44.51, cuts rotation error (ROT) from 22.91$^\circ$ to 20.93$^\circ$, and boosts $mAP_{0.5}$ from 35.70 to 38.11. As a comparison, Cube R-CNN adds 3 other datasets together with SUN RGB-D for monocular 3D estimation, with a similar mechanism and test dataset, increases $IoU_{3D}$ from 36.2 to 37.8, boosts $mAP_{0.5}$ from 34.7 to 35.4. Because it operates purely through camera-space transforms, 3DRot is readily transferable to other 3D tasks.