Ret3D: Rethinking Object Relations for Efficient 3D Object Detection in Driving Scenes
This work addresses the need for more efficient and accurate 3D object detection in autonomous driving systems, representing an incremental improvement by enhancing existing detectors with relation modules.
The paper tackles the problem of inefficient exploitation of object relations in LiDAR-based 3D object detection for driving scenes by introducing Ret3D, a two-stage detector with spatial and temporal relation modules, achieving state-of-the-art performance with 5.5% and 3.2% higher mAPH metrics on vehicle detection compared to recent competitors.
Current efficient LiDAR-based detection frameworks are lacking in exploiting object relations, which naturally present in both spatial and temporal manners. To this end, we introduce a simple, efficient, and effective two-stage detector, termed as Ret3D. At the core of Ret3D is the utilization of novel intra-frame and inter-frame relation modules to capture the spatial and temporal relations accordingly. More Specifically, intra-frame relation module (IntraRM) encapsulates the intra-frame objects into a sparse graph and thus allows us to refine the object features through efficient message passing. On the other hand, inter-frame relation module (InterRM) densely connects each object in its corresponding tracked sequences dynamically, and leverages such temporal information to further enhance its representations efficiently through a lightweight transformer network. We instantiate our novel designs of IntraRM and InterRM with general center-based or anchor-based detectors and evaluate them on Waymo Open Dataset (WOD). With negligible extra overhead, Ret3D achieves the state-of-the-art performance, being 5.5% and 3.2% higher than the recent competitor in terms of the LEVEL 1 and LEVEL 2 mAPH metrics on vehicle detection, respectively.