CoopTrack: Exploring End-to-End Learning for Efficient Cooperative Sequential Perception
This addresses the problem of limited perception in single-vehicle autonomous systems for improved safety and efficiency, though it appears incremental as it extends existing cooperative perception to sequential tasks.
The paper tackles cooperative sequential perception for autonomous driving by proposing CoopTrack, an end-to-end framework for cooperative 3D multi-object tracking, achieving state-of-the-art results with 39.0% mAP and 32.8% AMOTA on the V2X-Seq dataset.
Cooperative perception aims to address the inherent limitations of single-vehicle autonomous driving systems through information exchange among multiple agents. Previous research has primarily focused on single-frame perception tasks. However, the more challenging cooperative sequential perception tasks, such as cooperative 3D multi-object tracking, have not been thoroughly investigated. Therefore, we propose CoopTrack, a fully instance-level end-to-end framework for cooperative tracking, featuring learnable instance association, which fundamentally differs from existing approaches. CoopTrack transmits sparse instance-level features that significantly enhance perception capabilities while maintaining low transmission costs. Furthermore, the framework comprises two key components: Multi-Dimensional Feature Extraction, and Cross-Agent Association and Aggregation, which collectively enable comprehensive instance representation with semantic and motion features, and adaptive cross-agent association and fusion based on a feature graph. Experiments on both the V2X-Seq and Griffin datasets demonstrate that CoopTrack achieves excellent performance. Specifically, it attains state-of-the-art results on V2X-Seq, with 39.0\% mAP and 32.8\% AMOTA. The project is available at https://github.com/zhongjiaru/CoopTrack.