Yongyu Chen

73.8ROMar 14

Fine-tuning is Not Enough: A Parallel Framework for Collaborative Imitation and Reinforcement Learning in End-to-end Autonomous Driving

Zhexi Lian, Haoran Wang, Xuerun Yan et al.

End-to-end autonomous driving is typically built upon imitation learning (IL), yet its performance is constrained by the quality of human demonstrations. To overcome this limitation, recent methods incorporate reinforcement learning (RL) through sequential fine-tuning. However, such a paradigm remains suboptimal: sequential RL fine-tuning can introduce policy drift and often leads to a performance ceiling due to its dependence on the pretrained IL policy. To address these issues, we propose PaIR-Drive, a general Parallel framework for collaborative Imitation and Reinforcement learning in end-to-end autonomous driving. During training, PaIR-Drive separates IL and RL into two parallel branches with conflict-free training objectives, enabling fully collaborative optimization. This design eliminates the need to retrain RL when applying a new IL policy. During inference, RL leverages the IL policy to further optimize the final plan, allowing performance beyond prior knowledge of IL. Furthermore, we introduce a tree-structured trajectory neural sampler to group relative policy optimization (GRPO) in the RL branch, which enhances exploration capability. Extensive analysis on NAVSIMv1 and v2 benchmark demonstrates that PaIR-Drive achieves Competitive performance of 91.2 PDMS and 87.9 EPDMS, building upon Transfuser and DiffusionDrive IL baselines. PaIR-Drive consistently outperforms existing RL fine-tuning methods, and could even correct human experts' suboptimal behaviors. Qualitative results further confirm that PaIR-Drive can effectively explore and generate high-quality trajectories.

44.8DSApr 4

Approximation Algorithms for Capacitated Vehicle Routing Problems: A Comprehensive Survey

Yongyu Chen

The Capacitated Vehicle Routing Problem (CVRP) is a core NP-hard problem in the field of combinatorial optimization. It aims to plan optimal routes for a fleet of vehicles with uniform capacity, serving a set of customers with specific demands from a single depot, while minimizing the total travel distance. Due to its extensive applications in logistics, distribution, and supply chain management, CVRP has attracted significant research attention. Theoretically, the problem has been proven to be APX-hard, and in general metric spaces, approximate solutions of arbitrary precision cannot be obtained unless P=NP. These inherent complexities highlight the importance of developing approximation algorithms-finding solutions with provable performance guarantees in polynomial time. This paper aims to provide a systematic and comprehensive survey of the research progress on CVRP approximation algorithms. The paper first strictly defines CVRP and its key variants, and elucidates the sources of its fundamental computational complexity. Subsequently, the article deeply analyzes the core algorithmic frameworks and technical schools of thought in this field, including: the Iterated Tour Partitioning (ITP) framework that laid the foundation of the field and its latest improvements; the evolution of approximation schemes (PTAS/QPTAS) for geometric spaces such as Euclidean space; and modern algorithms for structured graphs like trees, planar graphs, and graphs with bounded highway dimension, with a particular focus on techniques based on metric embedding and linear programming relaxation. Finally, this paper summarizes the current best approximation ratios for various problem settings and systematically outlines the core unresolved open problems in the field, pointing out directions for future research.

Yongyu Chen

2 Papers