48.5CVMay 18Code
One Model to Translate Them All: Universal Any-to-Any Translation for Heterogeneous Collaborative PerceptionYang Li, Weize Li, Quan Yuan et al.
By sharing intermediate features, collaborative perception extends each agent's sensing beyond standalone limits, but real-world feature modality heterogeneity remains a key barrier to effective fusion. Most existing methods, including direct adaption and protocol-based transformation, typically rely on training adapters for newly emerging feature modalities and often require additional retraining or fine-tuning. Such repeated training is costly and is often infeasible across manufacturers due to model and data privacy constraints, limiting real-world scalability. To address this issue, we propose UniTrans, a universal any-to-any feature modality translation model that instantiates translators on the fly for arbitrary modalities. UniTrans pretrains a bank of translator expert parameters and learns their combination coefficients as a function of source-to-target modality mapping. The mapping is measured in a modality-intrinsic latent space, where an intrinsic encoder extracts modality-specific yet scene-invariant codes from single-frame intermediate features, enabling UniTrans to instantiate translators in a zero-shot manner. Experiments on OPV2V-H and DAIR-V2X demonstrate that UniTrans consistently outperforms state-of-the-art methods in both simulated and real-world settings, enabling efficient any-to-any translation through a universal model. The code is available at https://github.com/CheeryLeeyy/UniTrans.
CVSep 12, 2024
CollaMamba: Efficient Collaborative Perception with Cross-Agent Spatial-Temporal State Space ModelYang Li, Quan Yuan, Guiyang Luo et al.
By sharing complementary perceptual information, multi-agent collaborative perception fosters a deeper understanding of the environment. Recent studies on collaborative perception mostly utilize CNNs or Transformers to learn feature representation and fusion in the spatial dimension, which struggle to handle long-range spatial-temporal features under limited computing and communication resources. Holistically modeling the dependencies over extensive spatial areas and extended temporal frames is crucial to enhancing feature quality. To this end, we propose a resource efficient cross-agent spatial-temporal collaborative state space model (SSM), named CollaMamba. Initially, we construct a foundational backbone network based on spatial SSM. This backbone adeptly captures positional causal dependencies from both single-agent and cross-agent views, yielding compact and comprehensive intermediate features while maintaining linear complexity. Furthermore, we devise a history-aware feature boosting module based on temporal SSM, extracting contextual cues from extended historical frames to refine vague features while preserving low overhead. Extensive experiments across several datasets demonstrate that CollaMamba outperforms state-of-the-art methods, achieving higher model accuracy while reducing computational and communication overhead by up to 71.9% and 1/64, respectively. This work pioneers the exploration of the Mamba's potential in collaborative perception. The source code will be made available.
CVNov 25, 2024Code
One is Plenty: A Polymorphic Feature Interpreter for Immutable Heterogeneous Collaborative PerceptionYuchen Xia, Quan Yuan, Guiyang Luo et al.
Collaborative perception in autonomous driving significantly enhances the perception capabilities of individual agents. Immutable heterogeneity, where agents have different and fixed perception networks, presents a major challenge due to the semantic gap in exchanged intermediate features without modifying the perception networks. Most existing methods bridge the semantic gap through interpreters. However, they either require training a new interpreter for each new agent type, limiting extensibility, or rely on a two-stage interpretation via an intermediate standardized semantic space, causing cumulative semantic loss. To achieve both extensibility in immutable heterogeneous scenarios and low-loss feature interpretation, we propose PolyInter, a polymorphic feature interpreter. It provides an extension point where new agents integrate by overriding only their specific prompts, which are learnable parameters that guide interpretation, while reusing PolyInter's remaining parameters. By leveraging polymorphism, our design enables a single interpreter to accommodate diverse agents and interpret their features into the ego agent's semantic space. Experiments on the OPV2V dataset demonstrate that PolyInter improves collaborative perception precision by up to 11.1% compared to SOTA interpreters, while comparable results can be achieved by training only 1.4% of PolyInter's parameters when adapting to new agents. Code is available at https://github.com/yuchen-xia/PolyInter.
CVAug 27, 2025Code
Beyond BEV: Optimizing Point-Level Tokens for Collaborative PerceptionYang Li, Quan Yuan, Guiyang Luo et al.
Collaborative perception allows agents to enhance their perceptual capabilities by exchanging intermediate features. Existing methods typically organize these intermediate features as 2D bird's-eye-view (BEV) representations, which discard critical fine-grained 3D structural cues essential for accurate object recognition and localization. To this end, we first introduce point-level tokens as intermediate representations for collaborative perception. However, point-cloud data are inherently unordered, massive, and position-sensitive, making it challenging to produce compact and aligned point-level token sequences that preserve detailed structural information. Therefore, we present CoPLOT, a novel Collaborative perception framework that utilizes Point-Level Optimized Tokens. It incorporates a point-native processing pipeline, including token reordering, sequence modeling, and multi-agent spatial alignment. A semantic-aware token reordering module generates adaptive 1D reorderings by leveraging scene-level and token-level semantic information. A frequency-enhanced state space model captures long-range sequence dependencies across both spatial and spectral domains, improving the differentiation between foreground tokens and background clutter. Lastly, a neighbor-to-ego alignment module applies a closed-loop process, combining global agent-level correction with local token-level refinement to mitigate localization noise. Extensive experiments on both simulated and real-world datasets show that CoPLOT outperforms state-of-the-art models, with even lower communication and computation overhead. Code will be available at https://github.com/CheeryLeeyy/CoPLOT.
IRFeb 5, 2024
Understanding and Guiding Weakly Supervised Entity Alignment with Potential Isomorphism PropagationYuanyi Wang, Wei Tang, Haifeng Sun et al.
Weakly Supervised Entity Alignment (EA) is the task of identifying equivalent entities across diverse knowledge graphs (KGs) using only a limited number of seed alignments. Despite substantial advances in aggregation-based weakly supervised EA, the underlying mechanisms in this setting remain unexplored. In this paper, we present a propagation perspective to analyze weakly supervised EA and explain the existing aggregation-based EA models. Our theoretical analysis reveals that these models essentially seek propagation operators for pairwise entity similarities. We further prove that, despite the structural heterogeneity of different KGs, the potentially aligned entities within aggregation-based EA models have isomorphic subgraphs, which is the core premise of EA but has not been investigated. Leveraging this insight, we introduce a potential isomorphism propagation operator to enhance the propagation of neighborhood information across KGs. We develop a general EA framework, PipEA, incorporating this operator to improve the accuracy of every type of aggregation-based model without altering the learning process. Extensive experiments substantiate our theoretical findings and demonstrate PipEA's significant performance gains over state-of-the-art weakly supervised EA methods. Our work not only advances the field but also enhances our comprehension of aggregation-based weakly supervised EA.