CVOct 31, 2025

NegoCollab: A Common Representation Negotiation Approach for Heterogeneous Collaborative Perception

arXiv:2510.27647v13 citationsh-index: 18
Originality Incremental advance
AI Analysis

This addresses a key challenge in multi-agent systems for autonomous driving by enabling better collaboration among diverse agents, though it is an incremental improvement over existing alignment methods.

The paper tackles the problem of domain gaps in collaborative perception caused by heterogeneous agents with fixed perception models, proposing NegoCollab to negotiate a common representation that reduces these gaps and improves performance, achieving competitive results on benchmarks like OPV2V and V2X-Sim.

Collaborative perception improves task performance by expanding the perception range through information sharing among agents. . Immutable heterogeneity poses a significant challenge in collaborative perception, as participating agents may employ different and fixed perception models. This leads to domain gaps in the intermediate features shared among agents, consequently degrading collaborative performance. Aligning the features of all agents to a common representation can eliminate domain gaps with low training cost. However, in existing methods, the common representation is designated as the representation of a specific agent, making it difficult for agents with significant domain discrepancies from this specific agent to achieve proper alignment. This paper proposes NegoCollab, a heterogeneous collaboration method based on the negotiated common representation. It introduces a negotiator during training to derive the common representation from the local representations of each modality's agent, effectively reducing the inherent domain gap with the various local representations. In NegoCollab, the mutual transformation of features between the local representation space and the common representation space is achieved by a pair of sender and receiver. To better align local representations to the common representation containing multimodal information, we introduce structural alignment loss and pragmatic alignment loss in addition to the distribution alignment loss to supervise the training. This enables the knowledge in the common representation to be fully distilled into the sender.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes