MixSup: Mixed-grained Supervision for Label-efficient LiDAR-based 3D Object Detection
This addresses the annotation burden in 3D object detection for autonomous driving applications, offering a practical incremental improvement over existing weakly/semi-supervised methods.
The paper tackles label-efficient 3D object detection from LiDAR point clouds by proposing MixSup, a method that uses mixed-grained supervision with cheap cluster-level labels for semantics and limited box-level labels for geometry, achieving up to 97.31% of fully supervised performance with only 10% box annotations.
Label-efficient LiDAR-based 3D object detection is currently dominated by weakly/semi-supervised methods. Instead of exclusively following one of them, we propose MixSup, a more practical paradigm simultaneously utilizing massive cheap coarse labels and a limited number of accurate labels for Mixed-grained Supervision. We start by observing that point clouds are usually textureless, making it hard to learn semantics. However, point clouds are geometrically rich and scale-invariant to the distances from sensors, making it relatively easy to learn the geometry of objects, such as poses and shapes. Thus, MixSup leverages massive coarse cluster-level labels to learn semantics and a few expensive box-level labels to learn accurate poses and shapes. We redesign the label assignment in mainstream detectors, which allows them seamlessly integrated into MixSup, enabling practicality and universality. We validate its effectiveness in nuScenes, Waymo Open Dataset, and KITTI, employing various detectors. MixSup achieves up to 97.31% of fully supervised performance, using cheap cluster annotations and only 10% box annotations. Furthermore, we propose PointSAM based on the Segment Anything Model for automated coarse labeling, further reducing the annotation burden. The code is available at https://github.com/BraveGroup/PointSAM-for-MixSup.