Zhichen Wang

h-index1
2papers

2 Papers

ROMar 24, 2023
Interpretable Motion Planner for Urban Driving via Hierarchical Imitation Learning

Bikun Wang, Zhipeng Wang, Chenhao Zhu et al.

Learning-based approaches have achieved remarkable performance in the domain of autonomous driving. Leveraging the impressive ability of neural networks and large amounts of human driving data, complex patterns and rules of driving behavior can be encoded as a model to benefit the autonomous driving system. Besides, an increasing number of data-driven works have been studied in the decision-making and motion planning module. However, the reliability and the stability of the neural network is still full of uncertainty. In this paper, we introduce a hierarchical planning architecture including a high-level grid-based behavior planner and a low-level trajectory planner, which is highly interpretable and controllable. As the high-level planner is responsible for finding a consistent route, the low-level planner generates a feasible trajectory. We evaluate our method both in closed-loop simulation and real world driving, and demonstrate the neural network planner has outstanding performance in complex urban autonomous driving scenarios.

CVDec 28, 2024
DepthMamba with Adaptive Fusion

Zelin Meng, Zhichen Wang

Multi-view depth estimation has achieved impressive performance over various benchmarks. However, almost all current multi-view systems rely on given ideal camera poses, which are unavailable in many real-world scenarios, such as autonomous driving. In this work, we propose a new robustness benchmark to evaluate the depth estimation system under various noisy pose settings. Surprisingly, we find current multi-view depth estimation methods or single-view and multi-view fusion methods will fail when given noisy pose settings. To tackle this challenge, we propose a two-branch network architecture which fuses the depth estimation results of single-view and multi-view branch. In specific, we introduced mamba to serve as feature extraction backbone and propose an attention-based fusion methods which adaptively select the most robust estimation results between the two branches. Thus, the proposed method can perform well on some challenging scenes including dynamic objects, texture-less regions, etc. Ablation studies prove the effectiveness of the backbone and fusion method, while evaluation experiments on challenging benchmarks (KITTI and DDAD) show that the proposed method achieves a competitive performance compared to the state-of-the-art methods.