Zhiying Song

CV
h-index19
4papers
91citations
Novelty40%
AI Score34

4 Papers

MAOct 12, 2022
A Cooperative Perception System Robust to Localization Errors

Zhiying Song, Fuxi Wen, Hailiang Zhang et al.

Cooperative perception is challenging for safety-critical autonomous driving applications.The errors in the shared position and pose cause an inaccurate relative transform estimation and disrupt the robust mapping of the Ego vehicle. We propose a distributed object-level cooperative perception system called OptiMatch, in which the detected 3D bounding boxes and local state information are shared between the connected vehicles. To correct the noisy relative transform, the local measurements of both connected vehicles (bounding boxes) are utilized, and an optimal transport theory-based algorithm is developed to filter out those objects jointly detected by the vehicles along with their correspondence, constructing an associated co-visible set. A correction transform is estimated from the matched object pairs and further applied to the noisy relative transform, followed by global fusion and dynamic mapping. Experiment results show that robust performance is achieved for different levels of location and heading errors, and the proposed framework outperforms the state-of-the-art benchmark fusion schemes, including early, late, and intermediate fusion, on average precision by a large margin when location and/or heading errors occur.

CVNov 17, 2024Code
V2X-Radar: A Multi-modal Dataset with 4D Radar for Cooperative Perception

Lei Yang, Xinyu Zhang, Jun Li et al.

Modern autonomous vehicle perception systems often struggle with occlusions and limited perception range. Previous studies have demonstrated the effectiveness of cooperative perception in extending the perception range and overcoming occlusions, thereby enhancing the safety of autonomous driving. In recent years, a series of cooperative perception datasets have emerged; however, these datasets primarily focus on cameras and LiDAR, neglecting 4D Radar, a sensor used in single-vehicle autonomous driving to provide robust perception in adverse weather conditions. In this paper, to bridge the gap created by the absence of 4D Radar datasets in cooperative perception, we present V2X-Radar, the first large-scale, real-world multi-modal dataset featuring 4D Radar. V2X-Radar dataset is collected using a connected vehicle platform and an intelligent roadside unit equipped with 4D Radar, LiDAR, and multi-view cameras. The collected data encompasses sunny and rainy weather conditions, spanning daytime, dusk, and nighttime, as well as various typical challenging scenarios. The dataset consists of 20K LiDAR frames, 40K camera images, and 20K 4D Radar data, including 350K annotated boxes across five categories. To support various research domains, we have established V2X-Radar-C for cooperative perception, V2X-Radar-I for roadside perception, and V2X-Radar-V for single-vehicle perception. Furthermore, we provide comprehensive benchmarks across these three sub-datasets. We will release all datasets and benchmark codebase at https://huggingface.co/datasets/yanglei18/V2X-Radar and https://github.com/yanglei18/V2X-Radar.

CVMar 25, 2025
TraF-Align: Trajectory-aware Feature Alignment for Asynchronous Multi-agent Perception

Zhiying Song, Lei Yang, Fuxi Wen et al.

Cooperative perception presents significant potential for enhancing the sensing capabilities of individual vehicles, however, inter-agent latency remains a critical challenge. Latencies cause misalignments in both spatial and semantic features, complicating the fusion of real-time observations from the ego vehicle with delayed data from others. To address these issues, we propose TraF-Align, a novel framework that learns the flow path of features by predicting the feature-level trajectory of objects from past observations up to the ego vehicle's current time. By generating temporally ordered sampling points along these paths, TraF-Align directs attention from the current-time query to relevant historical features along each trajectory, supporting the reconstruction of current-time features and promoting semantic interaction across multiple frames. This approach corrects spatial misalignment and ensures semantic consistency across agents, effectively compensating for motion and achieving coherent feature fusion. Experiments on two real-world datasets, V2V4Real and DAIR-V2X-Seq, show that TraF-Align sets a new benchmark for asynchronous cooperative perception.

OHApr 30, 2025
Wireless Communication as an Information Sensor for Multi-agent Cooperative Perception: A Survey

Zhiying Song, Tenghui Xie, Fuxi Wen et al.

Cooperative perception extends the perception capabilities of autonomous vehicles by enabling multi-agent information sharing via Vehicle-to-Everything (V2X) communication. Unlike traditional onboard sensors, V2X acts as a dynamic "information sensor" characterized by limited communication, heterogeneity, mobility, and scalability. This survey provides a comprehensive review of recent advancements from the perspective of information-centric cooperative perception, focusing on three key dimensions: information representation, information fusion, and large-scale deployment. We categorize information representation into data-level, feature-level, and object-level schemes, and highlight emerging methods for reducing data volume and compressing messages under communication constraints. In information fusion, we explore techniques under both ideal and non-ideal conditions, including those addressing heterogeneity, localization errors, latency, and packet loss. Finally, we summarize system-level approaches to support scalability in dense traffic scenarios. Compared with existing surveys, this paper introduces a new perspective by treating V2X communication as an information sensor and emphasizing the challenges of deploying cooperative perception in real-world intelligent transportation systems.