CVAug 1, 2025

CoST: Efficient Collaborative Perception From Unified Spatiotemporal Perspective

arXiv:2508.00359v16 citationsh-index: 8
Originality Incremental advance
AI Analysis

This addresses the challenge of occlusions and limited sensing range for autonomous agents, offering a compatible enhancement to existing methods.

The paper tackles the problem of inefficient and separate multi-agent and multi-time fusion in collaborative perception by proposing CoST, which unifies these into a single spatio-temporal space, resulting in improved efficiency and accuracy with reduced transmission bandwidth.

Collaborative perception shares information among different agents and helps solving problems that individual agents may face, e.g., occlusions and small sensing range. Prior methods usually separate the multi-agent fusion and multi-time fusion into two consecutive steps. In contrast, this paper proposes an efficient collaborative perception that aggregates the observations from different agents (space) and different times into a unified spatio-temporal space simultanesouly. The unified spatio-temporal space brings two benefits, i.e., efficient feature transmission and superior feature fusion. 1) Efficient feature transmission: each static object yields a single observation in the spatial temporal space, and thus only requires transmission only once (whereas prior methods re-transmit all the object features multiple times). 2) superior feature fusion: merging the multi-agent and multi-time fusion into a unified spatial-temporal aggregation enables a more holistic perspective, thereby enhancing perception performance in challenging scenarios. Consequently, our Collaborative perception with Spatio-temporal Transformer (CoST) gains improvement in both efficiency and accuracy. Notably, CoST is not tied to any specific method and is compatible with a majority of previous methods, enhancing their accuracy while reducing the transmission bandwidth.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes