Wenchao Sun

CV
h-index13
10papers
157citations
Novelty54%
AI Score54

10 Papers

82.2CVMar 31Code
SparseDriveV2: Scoring is All You Need for End-to-End Autonomous Driving

Wenchao Sun, Xuewu Lin, Keyu Chen et al. · tsinghua

End-to-end multi-modal planning has been widely adopted to model the uncertainty of driving behavior, typically by scoring candidate trajectories and selecting the optimal one. Existing approaches generally fall into two categories: scoring a large static trajectory vocabulary, or scoring a small set of dynamically generated proposals. While static vocabularies often suffer from coarse discretization of the action space, dynamic proposals provide finer-grained precision and have shown stronger empirical performance on existing benchmarks. However, it remains unclear whether dynamic generation is fundamentally necessary, or whether static vocabularies can already achieve comparable performance when they are sufficiently dense to cover the action space. In this work, we start with a systematic scaling study of Hydra-MDP, a representative scoring-based method, revealing that performance consistently improves as trajectory anchors become denser, without exhibiting saturation before computational constraints are reached. Motivated by this observation, we propose SparseDriveV2 to push the performance boundary of scoring-based planning through two complementary innovations: (1) a scalable vocabulary representation with a factorized structure that decomposes trajectories into geometric paths and velocity profiles, enabling combinatorial coverage of the action space, and (2) a scalable scoring strategy with coarse factorized scoring over paths and velocity profiles followed by fine-grained scoring on a small set of composed trajectories. By combining these two techniques, SparseDriveV2 achieves 92.0 PDMS and 90.1 EPDMS on NAVSIM, with 89.15 Driving Score and 70.00 Success Rate on Bench2Drive with a lightweight ResNet-34 as backbone. Code and model are released at https://github.com/swc-17/SparseDriveV2.

CVDec 7, 2025Code
SparseCoop: Cooperative Perception with Kinematic-Grounded Queries

Jiahao Wang, Zhongwei Jiang, Wenchao Sun et al.

Cooperative perception is critical for autonomous driving, overcoming the inherent limitations of a single vehicle, such as occlusions and constrained fields-of-view. However, current approaches sharing dense Bird's-Eye-View (BEV) features are constrained by quadratically-scaling communication costs and the lack of flexibility and interpretability for precise alignment across asynchronous or disparate viewpoints. While emerging sparse query-based methods offer an alternative, they often suffer from inadequate geometric representations, suboptimal fusion strategies, and training instability. In this paper, we propose SparseCoop, a fully sparse cooperative perception framework for 3D detection and tracking that completely discards intermediate BEV representations. Our framework features a trio of innovations: a kinematic-grounded instance query that uses an explicit state vector with 3D geometry and velocity for precise spatio-temporal alignment; a coarse-to-fine aggregation module for robust fusion; and a cooperative instance denoising task to accelerate and stabilize training. Experiments on V2X-Seq and Griffin datasets show SparseCoop achieves state-of-the-art performance. Notably, it delivers this with superior computational efficiency, low transmission cost, and strong robustness to communication latency. Code is available at https://github.com/wang-jh18-SVM/SparseCoop.

97.2LGMay 6
CRAFT: Counterfactual-to-Interactive Reinforcement Fine-Tuning for Driving Policies

Keyu Chen, Nanfei Ye, Yida Wang et al.

Open-loop imitation learning has advanced modern autonomous driving policy architectures, but closed-loop deployment remains vulnerable to policy-induced distribution shift. Existing post-training paradigms exhibit fundamental trade-offs: closed-loop RL fine-tuning provides grounded feedback from executed actions but is constrained by the sparsity of informative events, whereas counterfactual fine-tuning provides dense supervision over candidate futures but inherits bias from imperfect future estimates. We introduce Counterfactual-to-Interactive Reinforcement Fine-Tuning (CRAFT), an on-policy framework that formulates closed-loop post-training as proxy-residual optimization. CRAFT uses group-normalized counterfactual advantages as a dense proxy for real closed-loop advantages and aligns this proxy with the closed-loop world through grounded residual correction from interaction-critical events. To stabilize adaptation, CRAFT regularizes the online policy toward an EMA teacher via asymmetric KL self-distillation. Theoretically, CRAFT decomposes the real closed-loop policy gradient into proxy and residual terms under the same visited-state distribution, reducing residual variance with an aligned proxy while mitigating proxy bias through grounded residual approximation. Empirically, CRAFT achieves the strongest closed-loop gains on Bench2Drive across hierarchical planning, vision-language-action, and vocabulary-scoring architectures. Ablations, scaling behavior, stability analyses, and transfer results further validate the complementary roles of dense counterfactual proxy and grounded residual correction. Project page: https://currychen77.github.io/CRAFT.

CVJun 16, 2025Code
COME: Adding Scene-Centric Forecasting Control to Occupancy World Model

Yining Shi, Kun Jiang, Qiang Meng et al. · tsinghua

World models are critical for autonomous driving to simulate environmental dynamics and generate synthetic data. Existing methods struggle to disentangle ego-vehicle motion (perspective shifts) from scene evolvement (agent interactions), leading to suboptimal predictions. Instead, we propose to separate environmental changes from ego-motion by leveraging the scene-centric coordinate systems. In this paper, we introduce COME: a framework that integrates scene-centric forecasting Control into the Occupancy world ModEl. Specifically, COME first generates ego-irrelevant, spatially consistent future features through a scene-centric prediction branch, which are then converted into scene condition using a tailored ControlNet. These condition features are subsequently injected into the occupancy world model, enabling more accurate and controllable future occupancy predictions. Experimental results on the nuScenes-Occ3D dataset show that COME achieves consistent and significant improvements over state-of-the-art (SOTA) methods across diverse configurations, including different input sources (ground-truth, camera-based, fusion-based occupancy) and prediction horizons (3s and 8s). For example, under the same settings, COME achieves 26.3% better mIoU metric than DOME and 23.7% better mIoU metric than UniScene. These results highlight the efficacy of disentangled representation learning in enhancing spatio-temporal prediction fidelity for world models. Code and videos will be available at https://github.com/synsin0/COME.

CVApr 22, 2024
Neural Radiance Field in Autonomous Driving: A Survey

Lei He, Leheng Li, Wenchao Sun et al.

Neural Radiance Field (NeRF) has garnered significant attention from both academia and industry due to its intrinsic advantages, particularly its implicit representation and novel view synthesis capabilities. With the rapid advancements in deep learning, a multitude of methods have emerged to explore the potential applications of NeRF in the domain of Autonomous Driving (AD). However, a conspicuous void is apparent within the current literature. To bridge this gap, this paper conducts a comprehensive survey of NeRF's applications in the context of AD. Our survey is structured to categorize NeRF's applications in Autonomous Driving (AD), specifically encompassing perception, 3D reconstruction, simultaneous localization and mapping (SLAM), and simulation. We delve into in-depth analysis and summarize the findings for each application category, and conclude by providing insights and discussions on future directions in this field. We hope this paper serves as a comprehensive reference for researchers in this domain. To the best of our knowledge, this is the first survey specifically focused on the applications of NeRF in the Autonomous Driving domain.

ROMay 6, 2025
RIFT: Group-Relative RL Fine-Tuning for Realistic and Controllable Traffic Simulation

Keyu Chen, Wenchao Sun, Hao Cheng et al.

Achieving both realism and controllability in closed-loop traffic simulation remains a key challenge in autonomous driving. Dataset-based methods reproduce realistic trajectories but suffer from covariate shift in closed-loop deployment, compounded by simplified dynamics models that further reduce reliability. Conversely, physics-based simulation methods enhance reliable and controllable closed-loop interactions but often lack expert demonstrations, compromising realism. To address these challenges, we introduce a dual-stage AV-centric simulation framework that conducts imitation learning pre-training in a data-driven simulator to capture trajectory-level realism and route-level controllability, followed by reinforcement learning fine-tuning in a physics-based simulator to enhance style-level controllability and mitigate covariate shift. In the fine-tuning stage, we propose RIFT, a novel group-relative RL fine-tuning strategy that evaluates all candidate modalities through group-relative formulation and employs a surrogate objective for stable optimization, enhancing style-level controllability and mitigating covariate shift while preserving the trajectory-level realism and route-level controllability inherited from IL pre-training. Extensive experiments demonstrate that RIFT improves realism and controllability in traffic simulation while simultaneously exposing the limitations of modern AV systems in closed-loop evaluation. Project Page: https://currychen77.github.io/RIFT/

CVApr 17, 2025
Fully Unified Motion Planning for End-to-End Autonomous Driving

Lin Liu, Caiyan Jia, Ziying Song et al.

Current end-to-end autonomous driving methods typically learn only from expert planning data collected from a single ego vehicle, severely limiting the diversity of learnable driving policies and scenarios. However, a critical yet overlooked fact is that in any driving scenario, multiple high-quality trajectories from other vehicles coexist with a specific ego vehicle's trajectory. Existing methods fail to fully exploit this valuable resource, missing important opportunities to improve the models' performance (including long-tail scenarios) through learning from other experts. Intuitively, Jointly learning from both ego and other vehicles' expert data is beneficial for planning tasks. However, this joint learning faces two critical challenges. (1) Different scene observation perspectives across vehicles hinder inter-vehicle alignment of scene feature representations; (2) The absence of partial modality in other vehicles' data (e.g., vehicle states) compared to ego-vehicle data introduces learning bias. To address these challenges, we propose FUMP (Fully Unified Motion Planning), a novel two-stage trajectory generation framework. Building upon probabilistic decomposition, we model the planning task as a specialized subtask of motion prediction. Specifically, our approach decouples trajectory planning into two stages. In Stage 1, a shared decoder jointly generates initial trajectories for both tasks. In Stage 2, the model performs planning-specific refinement conditioned on an ego-vehicle's state. The transition between the two stages is bridged by a state predictor trained exclusively on ego-vehicle data. To address the cross-vehicle discrepancy in observational perspectives, we propose an Equivariant Context-Sharing Adapter (ECSA) before Stage 1 for improving cross-vehicle generalization of scene representations.

ROJun 5, 2024
FREA: Feasibility-Guided Generation of Safety-Critical Scenarios with Reasonable Adversariality

Keyu Chen, Yuheng Lei, Hao Cheng et al.

Generating safety-critical scenarios, which are essential yet difficult to collect at scale, offers an effective method to evaluate the robustness of autonomous vehicles (AVs). Existing methods focus on optimizing adversariality while preserving the naturalness of scenarios, aiming to achieve a balance through data-driven approaches. However, without an appropriate upper bound for adversariality, the scenarios might exhibit excessive adversariality, potentially leading to unavoidable collisions. In this paper, we introduce FREA, a novel safety-critical scenarios generation method that incorporates the Largest Feasible Region (LFR) of AV as guidance to ensure the reasonableness of the adversarial scenarios. Concretely, FREA initially pre-calculates the LFR of AV from offline datasets. Subsequently, it learns a reasonable adversarial policy that controls the scene's critical background vehicles (CBVs) to generate adversarial yet AV-feasible scenarios by maximizing a novel feasibility-dependent adversarial objective function. Extensive experiments illustrate that FREA can effectively generate safety-critical scenarios, yielding considerable near-miss events while ensuring AV's feasibility. Generalization analysis also confirms the robustness of FREA in AV testing across various surrogate AV methods and traffic environments.

LGNov 25, 2021
Learn Zero-Constraint-Violation Policy in Model-Free Constrained Reinforcement Learning

Haitong Ma, Changliu Liu, Shengbo Eben Li et al.

In the trial-and-error mechanism of reinforcement learning (RL), a notorious contradiction arises when we expect to learn a safe policy: how to learn a safe policy without enough data and prior model about the dangerous region? Existing methods mostly use the posterior penalty for dangerous actions, which means that the agent is not penalized until experiencing danger. This fact causes that the agent cannot learn a zero-violation policy even after convergence. Otherwise, it would not receive any penalty and lose the knowledge about danger. In this paper, we propose the safe set actor-critic (SSAC) algorithm, which confines the policy update using safety-oriented energy functions, or the safety indexes. The safety index is designed to increase rapidly for potentially dangerous actions, which allows us to locate the safe set on the action space, or the control safe set. Therefore, we can identify the dangerous actions prior to taking them, and further obtain a zero constraint-violation policy after convergence.We claim that we can learn the energy function in a model-free manner similar to learning a value function. By using the energy function transition as the constraint objective, we formulate a constrained RL problem. We prove that our Lagrangian-based solutions make sure that the learned policy will converge to the constrained optimum under some assumptions. The proposed algorithm is evaluated on both the complex simulation environments and a hardware-in-loop (HIL) experiment with a real controller from the autonomous vehicle. Experimental results suggest that the converged policy in all environments achieves zero constraint violation and comparable performance with model-based baselines.

IROct 6, 2019
Parallel Split-Join Networks for Shared-account Cross-domain Sequential Recommendations

Wenchao Sun, Muyang Ma, Pengjie Ren et al.

Sequential recommendation is a task in which one models and uses sequential information about user behavior for recommendation purposes. We study sequential recommendation in a particularly challenging context, in which multiple individual users share asingle account (i.e., they have a shared account) and in which user behavior is available in multiple domains (i.e., recommendations are cross-domain). These two characteristics bring new challenges on top of those of the traditional sequential recommendation task. First, we need to identify the behavior associated with different users and different user roles under the same account in order to recommend the right item to the right user role at the right time. Second, we need to identify behavior in one domain that might be helpful to improve recommendations in other domains. In this work, we study shared account cross-domain sequential recommendation and propose Parallel Split-Join Network (PSJNet), a parallel modeling network to address the two challenges above. We present two variants of PSJNet, PSJNet-I and PSJNet-II. PSJNet-I is a "split-by-join" framework that splits the mixed representations to get role-specific representations and joins them to obtain cross-domain representations at each timestamp simultaneously. PSJNet-II is a "split-and-join" framework that first splits role-specific representations at each timestamp, and then the representations from all timestamps and all roles are joined to obtain cross-domain representations. We use two datasets to assess the effectiveness of PSJNet. Our experimental results demonstrate that PSJNet outperforms state-of-the-art sequential recommendation baselines in terms of MRR and Recall.