Zihao Lu

h-index10

5papers

710citations

Novelty17%

AI Score34

Ranked #113,527 of 194,257 authors (top 58%)#37,901 in CV (top 64%)

5 Papers

9.4CVDec 22, 2022

Vision-Based Environmental Perception for Autonomous Driving

Fei Liu, Zihao Lu, Xianke Lin

Visual perception plays an important role in autonomous driving. One of the primary tasks is object detection and identification. Since the vision sensor is rich in color and texture information, it can quickly and accurately identify various road information. The commonly used technique is based on extracting and calculating various features of the image. The recent development of deep learning-based method has better reliability and processing speed and has a greater advantage in recognizing complex elements. For depth estimation, vision sensor is also used for ranging due to their small size and low cost. Monocular camera uses image data from a single viewpoint as input to estimate object depth. In contrast, stereo vision is based on parallax and matching feature points of different views, and the application of deep learning also further improves the accuracy. In addition, Simultaneous Location and Mapping (SLAM) can establish a model of the road environment, thus helping the vehicle perceive the surrounding environment and complete the tasks. In this paper, we introduce and compare various methods of object detection and identification, then explain the development of depth estimation and compare various methods based on monocular, stereo, and RDBG sensors, next review and compare various methods of SLAM, and finally summarize the current problems and present the future development trends of vision technologies.

21.7CVMar 11Code

Senna-2: Aligning VLM and End-to-End Driving Policy for Consistent Decision Making and Planning

Yuehao Song, Shaoyu Chen, Hao Gao et al.

Vision-language models (VLMs) enhance the planning capability of end-to-end (E2E) driving policy by leveraging high-level semantic reasoning. However, existing approaches often overlook the dual-system consistency between VLM's high-level decision and E2E's low-level planning. As a result, the generated trajectories may misalign with the intended driving decisions, leading to weakened top-down guidance and decision-following ability of the system. To address this issue, we propose Senna-2, an advanced VLM-E2E driving policy that explicitly aligns the two systems for consistent decision-making and planning. Our method follows a consistency-oriented three-stage training paradigm. In the first stage, we conduct driving pre-training to achieve preliminary decision-making and planning, with a decision adapter transmitting VLM decisions to E2E policy in the form of implicit embeddings. In the second stage, we align the VLM and the E2E policy in an open-loop setting. In the third stage, we perform closed-loop alignment via bottom-up Hierarchical Reinforcement Learning in 3DGS environments to reinforce the safety and efficiency. Extensive experiments demonstrate that Senna-2 achieves superior dual-system consistency (19.3% F1 score improvement) and significantly enhances driving safety in both open-loop (5.7% FDE reduction) and closed-loop settings (30.6% AF-CR reduction).

1.4CVNov 22, 2022

Vision-based localization methods under GPS-denied conditions

Zihao Lu, Fei Liu, Xianke Lin

This paper reviews vision-based localization methods in GPS-denied environments and classifies the mainstream methods into Relative Vision Localization (RVL) and Absolute Vision Localization (AVL). For RVL, we discuss the broad application of optical flow in feature extraction-based Visual Odometry (VO) solutions and introduce advanced optical flow estimation methods. For AVL, we review recent advances in Visual Simultaneous Localization and Mapping (VSLAM) techniques, from optimization-based methods to Extended Kalman Filter (EKF) based methods. We also introduce the application of offline map registration and lane vision detection schemes to achieve Absolute Visual Localization. This paper compares the performance and applications of mainstream methods for visual localization and provides suggestions for future studies.

4.1LGJul 10

EXHOLD: Experience-Aware Real-Time Hold Control for Large-Scale Ride-Hailing Matching at DiDi

Xu Liu, Kai Wan, Zihao Lu

In large-scale ride-hailing, hold control is a critical mechanism for improving passenger-driver experience. By selectively delaying certain driver-order pairs, the system waits for better opportunities, reduces cancellations, and mitigates wasted driver effort. However, existing industrial hold strategies often rely on heuristic thresholding over multiple predictive models, which can be brittle under non-stationary traffic and hard to optimize for multi-objective experience signals. We propose EXHOLD, a deployable two-stage framework decoupling experience-aware pair assessment from hold-time execution. In Stage I, we learn a decision model assigning each driver-order pair to discrete, interpretable experience tiers by optimizing a unified objective that aggregates satisfaction signals across the matching funnel. In Stage II, we solve for a monotone hold-time schedule via constrained optimization over empirical quantiles. This explicitly enforces service guardrails bounding the unnecessary holding of promising matches while maximizing overall experience improvement. We evaluate EXHOLD through randomized A/B experiments in DiDi's production system in Brazil. Results show consistent gains in marketplace efficiency and experience: EXHOLD increases trip completion and driver income, significantly reduces passenger cancellations, and improves funnel efficiency. Ablations and behavioral analyses confirm both stages are essential and that the policy makes calibrated decisions under spatiotemporal heterogeneity. EXHOLD is currently deployed, serving production traffic in Brazil.

14.8AIJun 17

ProfiLLM: Utility-Aligned Agentic User Profiling for Industrial Ride-Hailing Dispatch

Tengfei Lyu, Zirui Yuan, Xu Liu et al.

Bringing Large Language Models (LLMs) into industrial ride-hailing dispatch as semantic feature extractors over platform-scale behavioral logs is a compelling but under-explored data systems problem. Production matching pipelines remain dominated by structured numerical features, yet decisive behavioral signals (e.g., a driver's habitual aversion to certain regions) are inherently contextual and naturally expressible as LLM-generated user profiles. However, scaling such profiling to a live, millisecond-latency dispatcher faces three intertwined constraints rarely addressed together: on a platform with millions of daily orders, logs exceed any LLM's context window by orders of magnitude; most users are long-tail, with too few interactions for per-user profiling; and surface-fluent profiles do not necessarily improve downstream prediction utility. We present ProfiLLM, an agentic LLM data pipeline that operationalizes utility-aligned user profiling for production matching systems through two modules. (1) Tool-Augmented Global Knowledge Mining equips an LLM agent with 27 analytical tools to mine platform-scale data, producing reusable global knowledge, adaptive user clustering rules, and region-level supply-demand priors. (2) Utility-Aligned Profile Exploration generates multiple candidate profiles per cluster, evaluates them via a lightweight downstream utility proxy, iteratively refines the best candidates and constructs preference pairs for DPO fine-tuning. Deployed on DiDi's production dispatcher, ProfiLLM achieves up to +6.14% relative AUC improvement in outcome prediction, up to +4.35% GMV gain in dispatching simulation, and consistent improvements in a 14-day online A/B test including +0.47% GMV, +0.33% Completion Rate, and -0.82% Cancel-Before-Accept rate.