Xichan Zhu

CV
h-index13
7papers
199citations
Novelty41%
AI Score32

7 Papers

CVApr 28, 2022Code
TJ4DRadSet: A 4D Radar Dataset for Autonomous Driving

Lianqing Zheng, Zhixiong Ma, Xichan Zhu et al.

The next-generation high-resolution automotive radar (4D radar) can provide additional elevation measurement and denser point clouds, which has great potential for 3D sensing in autonomous driving. In this paper, we introduce a dataset named TJ4DRadSet with 4D radar points for autonomous driving research. The dataset was collected in various driving scenarios, with a total of 7757 synchronized frames in 44 consecutive sequences, which are well annotated with 3D bounding boxes and track ids. We provide a 4D radar-based 3D object detection baseline for our dataset to demonstrate the effectiveness of deep learning methods for 4D radar point clouds. The dataset can be accessed via the following link: https://github.com/TJRadarLab/TJ4DRadSet.

CVJan 26, 2025Code
MetaOcc: Spatio-Temporal Fusion of Surround-View 4D Radar and Camera for 3D Occupancy Prediction with Dual Training Strategies

Long Yang, Lianqing Zheng, Wenjin Ai et al.

Robust 3D occupancy prediction is essential for autonomous driving, particularly under adverse weather conditions where traditional vision-only systems struggle. While the fusion of surround-view 4D radar and cameras offers a promising low-cost solution, effectively extracting and integrating features from these heterogeneous sensors remains challenging. This paper introduces MetaOcc, a novel multi-modal framework for omnidirectional 3D occupancy prediction that leverages both multi-view 4D radar and images. To address the limitations of directly applying LiDAR-oriented encoders to sparse radar data, we propose a Radar Height Self-Attention module that enhances vertical spatial reasoning and feature extraction. Additionally, a Hierarchical Multi-scale Multi-modal Fusion strategy is developed to perform adaptive local-global fusion across modalities and time, mitigating spatio-temporal misalignments and enriching fused feature representations. To reduce reliance on expensive point cloud annotations, we further propose a pseudo-label generation pipeline based on an open-set segmentor. This enables a semi-supervised strategy that achieves 90% of the fully supervised performance using only 50% of the ground truth labels, offering an effective trade-off between annotation cost and accuracy. Extensive experiments demonstrate that MetaOcc under full supervision achieves state-of-the-art performance, outperforming previous methods by +0.47 SC IoU and +4.02 mIoU on the OmniHD-Scenes dataset, and by +1.16 SC IoU and +1.24 mIoU on the SurroundOcc-nuScenes dataset. These results demonstrate the scalability and robustness of MetaOcc across sensor domains and training conditions, paving the way for practical deployment in real-world autonomous systems. Code and data are available at https://github.com/LucasYang567/MetaOcc.

CVJan 26, 2025
Doracamom: Joint 3D Detection and Occupancy Prediction with Multi-view 4D Radars and Cameras for Omnidirectional Perception

Lianqing Zheng, Jianan Liu, Runwei Guan et al.

3D object detection and occupancy prediction are critical tasks in autonomous driving, attracting significant attention. Despite the potential of recent vision-based methods, they encounter challenges under adverse conditions. Thus, integrating cameras with next-generation 4D imaging radar to achieve unified multi-task perception is highly significant, though research in this domain remains limited. In this paper, we propose Doracamom, the first framework that fuses multi-view cameras and 4D radar for joint 3D object detection and semantic occupancy prediction, enabling comprehensive environmental perception. Specifically, we introduce a novel Coarse Voxel Queries Generator that integrates geometric priors from 4D radar with semantic features from images to initialize voxel queries, establishing a robust foundation for subsequent Transformer-based refinement. To leverage temporal information, we design a Dual-Branch Temporal Encoder that processes multi-modal temporal features in parallel across BEV and voxel spaces, enabling comprehensive spatio-temporal representation learning. Furthermore, we propose a Cross-Modal BEV-Voxel Fusion module that adaptively fuses complementary features through attention mechanisms while employing auxiliary tasks to enhance feature quality. Extensive experiments on the OmniHD-Scenes, View-of-Delft (VoD), and TJ4DRadSet datasets demonstrate that Doracamom achieves state-of-the-art performance in both tasks, establishing new benchmarks for multi-modal 3D perception. Code and models will be publicly available.

CVDec 14, 2024
OmniHD-Scenes: A Next-Generation Multimodal Dataset for Autonomous Driving

Lianqing Zheng, Long Yang, Qunshu Lin et al.

The rapid advancement of deep learning has intensified the need for comprehensive data for use by autonomous driving algorithms. High-quality datasets are crucial for the development of effective data-driven autonomous driving solutions. Next-generation autonomous driving datasets must be multimodal, incorporating data from advanced sensors that feature extensive data coverage, detailed annotations, and diverse scene representation. To address this need, we present OmniHD-Scenes, a large-scale multimodal dataset that provides comprehensive omnidirectional high-definition data. The OmniHD-Scenes dataset combines data from 128-beam LiDAR, six cameras, and six 4D imaging radar systems to achieve full environmental perception. The dataset comprises 1501 clips, each approximately 30-s long, totaling more than 450K synchronized frames and more than 5.85 million synchronized sensor data points. We also propose a novel 4D annotation pipeline. To date, we have annotated 200 clips with more than 514K precise 3D bounding boxes. These clips also include semantic segmentation annotations for static scene elements. Additionally, we introduce a novel automated pipeline for generation of the dense occupancy ground truth, which effectively leverages information from non-key frames. Alongside the proposed dataset, we establish comprehensive evaluation metrics, baseline models, and benchmarks for 3D detection and semantic occupancy prediction. These benchmarks utilize surround-view cameras and 4D imaging radar to explore cost-effective sensor solutions for autonomous driving applications. Extensive experiments demonstrate the effectiveness of our low-cost sensor configuration and its robustness under adverse conditions. Data will be released at https://www.2077ai.com/OmniHD-Scenes.

SYFeb 4, 2021
A Learning-based Stochastic Driving Model for Autonomous Vehicle Testing

Lin Liu, Shuo Feng, Yiheng Feng et al.

In the simulation-based testing and evaluation of autonomous vehicles (AVs), how background vehicles (BVs) drive directly influences the AV's driving behavior and further impacts the testing result. Existing simulation platforms use either pre-determined trajectories or deterministic driving models to model the BVs' behaviors. However, pre-determined BV trajectories can not react to the AV's maneuvers, and deterministic models are different from real human drivers due to the lack of stochastic components and errors. Both methods lead to unrealistic traffic scenarios. This paper presents a learning-based stochastic driving model that meets the unique needs of AV testing, i.e. interactive and human-like. The model is built based on the long-short-term-memory (LSTM) architecture. By incorporating the concept of quantile-regression to the loss function of the model, the stochastic behaviors are reproduced without any prior assumption of human drivers. The model is trained with the large-scale naturalistic driving data (NDD) from the Safety Pilot Model Deployment(SPMD) project and then compared with a stochastic intelligent driving model (IDM). Analysis of individual trajectories shows that the proposed model can reproduce more similar trajectories to human drivers than IDM. To validate the ability of the proposed model in generating a naturalistic driving environment, traffic simulation experiments are implemented. The results show that the traffic flow parameters such as speed, range, and headway distribution match closely with the NDD, which is of significant importance for AV testing and evaluation.

SYOct 10, 2019
A Gradual Takeover Strategy of the Active Safety System

Rui Liu, Xichan Zhu, Xuan Zhao et al.

A gradual takeover strategy is proposed, in which the dynamic driving privilege assignment in real-time and the driving privilege gradual handover are realized. Firstly, the driving privilege assignment based on the risk level is achieved. The naturalistic driving data is applied to study the driver behavior during danger. TTC (time to collision) is defined as an obvious risk measure, whereas the time before the host vehicle has to brake assuming that the target vehicle is braking is defined as the potential risk measure, i.e. the time margin (TM). A risk assessment algorithm is proposed based on the obvious risk and potential risk. Secondly, the driving privilege gradual handover is realized. The non-cooperative MPC (model predictive control) is employed to resolve the conflicts between the driver and active safety system. The naturalistic driving data are applied to verify the effectiveness of the risk assessment algorithm, and the risk assessment algorithm performs better than TTC in the ROC (receiver operating characteristic). It is identified that the Nash equilibrium of the non-cooperative MPC can be achieved by using a non-iterative method. The driving privilege gradual handover is realized by using the confidence matrixes updating. The simulation verification shows that the gradual takeover strategy can achieve the driving privilege gradual handover between the driver and active safety system.

ROJul 3, 2019
Statistical Characteristics of Driver Acceleration Behavior and Its Probability Model

Rui Liu, Xuan Zhao, Xichan Zhu et al.

Naturalistic driving data were applied to study driver acceleration behaviour, and a probability model of the driver was proposed. First, the question of whether the database is large enough is resolved using kernel density estimation and Kullback-Liebler divergence. Next, the convergence database is utilised to achieve the bivariate acceleration distribution pattern. Subsequently, two probability models are proposed to explain the pattern. Finally, the statistical characteristics of the acceleration behaviours are studied to verify the probability models. The longitudinal and lateral acceleration behaviours always approximate a similar Pareto distribution. The braking, accelerating, and steering manoeuvres become more intense at first and then less intense as the velocity increases. These behaviours characteristics reveal the mechanism of the quadrangle bivariate acceleration distribution pattern. The bivariate acceleration behaviour of the driver will never reach a circle-shaped pattern. The bivariate Pareto distribution model can be applied to describe the bivariate acceleration behaviour of the driver.