CoopDETR: A Unified Cooperative Perception Framework for 3D Detection via Object Query
This addresses a practical problem for autonomous vehicles by enabling more efficient cooperative perception with reduced bandwidth, though it appears incremental as it builds on existing object query methods.
The paper tackles the challenge of balancing perception performance and transmission costs in cooperative perception for autonomous vehicles by proposing CoopDETR, a framework that uses object-level feature cooperation via object queries, achieving state-of-the-art performance and reducing transmission costs to 1/782 of previous methods.
Cooperative perception enhances the individual perception capabilities of autonomous vehicles (AVs) by providing a comprehensive view of the environment. However, balancing perception performance and transmission costs remains a significant challenge. Current approaches that transmit region-level features across agents are limited in interpretability and demand substantial bandwidth, making them unsuitable for practical applications. In this work, we propose CoopDETR, a novel cooperative perception framework that introduces object-level feature cooperation via object query. Our framework consists of two key modules: single-agent query generation, which efficiently encodes raw sensor data into object queries, reducing transmission cost while preserving essential information for detection; and cross-agent query fusion, which includes Spatial Query Matching (SQM) and Object Query Aggregation (OQA) to enable effective interaction between queries. Our experiments on the OPV2V and V2XSet datasets demonstrate that CoopDETR achieves state-of-the-art performance and significantly reduces transmission costs to 1/782 of previous methods.