FocalComm: Hard Instance-Aware Multi-Agent Perception
This addresses safety issues for vulnerable road users in autonomous driving, representing an incremental improvement with a novel method for a known bottleneck.
The paper tackles the problem of multi-agent collaborative perception in autonomous driving, which underperforms on safety-critical small objects like pedestrians, by proposing FocalComm, a framework that exchanges hard-instance-oriented features, resulting in outperforming state-of-the-art methods on real-world datasets and showing strong gains in pedestrian detection.
Multi-agent collaborative perception (CP) is a promising paradigm for improving autonomous driving safety, particularly for vulnerable road users like pedestrians, via robust 3D perception. However, existing CP approaches often optimize for vehicle detection performance metrics, underperforming on smaller, safety-critical objects such as pedestrians, where detection failures can be catastrophic. Furthermore, previous CP methods rely on full feature exchange rather than communicating only salient features that help reduce false negatives. To this end, we present FocalComm, a novel collaborative perception framework that focuses on exchanging hard-instance-oriented features among connected collaborative agents. FocalComm consists of two key novel designs: (1) a learnable progressive hard instance mining (HIM) module to extract hard instance-oriented features per agent, and (2) a query-based feature-level (intermediate) fusion technique that dynamically weights these identified features during collaboration. We show that FocalComm outperforms state-of-the-art collaborative perception methods on two challenging real-world datasets (V2X-Real and DAIR-V2X) across both vehicle-centric and infrastructure-centric collaborative setups. FocalComm also shows a strong performance gain in pedestrian detection in V2X-Real.