Reason-to-Transmit: Deliberative Adaptive Communication for Cooperative Perception
This addresses efficient communication for autonomous vehicles in V2X networks, offering incremental improvements over existing selective methods in challenging scenarios.
The paper tackles the problem of bandwidth-constrained cooperative perception in autonomous agents by introducing Reason-to-Transmit (R2T), a framework that uses a lightweight transformer to reason about what information to transmit, resulting in performance gains under high occlusion, approaching oracle levels, with communication improving performance by about 58% AP over no communication.
Cooperative perception among autonomous agents overcomes the limitations of single-agent sensing, but bandwidth constraints in vehicle-to-everything (V2X) networks require efficient communication policies. Existing approaches rely on reactive mechanisms, such as confidence maps, learned gating, or sparse masks, to decide what to transmit, without reasoning about why a message benefits the receiver. We introduce Reason-to-Transmit (R2T), a framework that equips each agent with a lightweight transformer-based module that reasons over local scene context, estimated neighbor information gaps, and bandwidth budget to make per-region transmission decisions. Trained end-to-end with a bandwidth-aware objective, R2T is evaluated against nine baselines in a multi-agent bird's-eye-view perception environment. Any communication improves performance by about 58% AP over no communication. At low bandwidth, all selective methods perform similarly, but R2T shows clear gains under high occlusion, where information asymmetry is greatest, approaching oracle performance. All methods degrade gracefully under packet drops up to 50%, showing robustness to communication failures. These results indicate that while fusion design dominates performance, deliberative communication provides additional gains in challenging scenarios. R2T introduces a reasoning-based approach to communication, enabling more efficient and context-aware information sharing in cooperative perception.