StreamLTS: Query-based Temporal-Spatial LiDAR Fusion for Cooperative Object Detection
This work addresses a critical bottleneck in cooperative object detection for autonomous vehicles, though it is incremental by building on existing datasets and methods.
The paper tackles the problem of asynchronous sensor times in cooperative perception for autonomous driving, which can cause over one meter of object misplacement, and proposes a fully sparse framework that improves efficiency over dense models.
Cooperative perception via communication among intelligent traffic agents has great potential to improve the safety of autonomous driving. However, limited communication bandwidth, localization errors and asynchronized capturing time of sensor data, all introduce difficulties to the data fusion of different agents. To some extend, previous works have attempted to reduce the shared data size, mitigate the spatial feature misalignment caused by localization errors and communication delay. However, none of them have considered the asynchronized sensor ticking times, which can lead to dynamic object misplacement of more than one meter during data fusion. In this work, we propose Time-Aligned COoperative Object Detection (TA-COOD), for which we adapt widely used dataset OPV2V and DairV2X with considering asynchronous LiDAR sensor ticking times and build an efficient fully sparse framework with modeling the temporal information of individual objects with query-based techniques. The experiment results confirmed the superior efficiency of our fully sparse framework compared to the state-of-the-art dense models. More importantly, they show that the point-wise observation timestamps of the dynamic objects are crucial for accurate modeling the object temporal context and the predictability of their time-related locations. The official code is available at \url{https://github.com/YuanYunshuang/CoSense3D}.