ROAIMar 16

CorrectionPlanner: Self-Correction Planner with Reinforcement Learning in Autonomous Driving

arXiv:2603.1577177.11 citationsh-index: 4
AI Analysis

This addresses safety in autonomous driving by enabling explicit self-correction, though it is an incremental improvement over existing learning-based planners.

The paper tackles the problem of autonomous driving planners lacking self-correction ability by proposing CorrectionPlanner, which uses a propose-evaluate-correct loop with reinforcement learning, resulting in a collision rate reduction of over 20% on Waymax and state-of-the-art scores on nuPlan.

Autonomous driving requires safe planning, but most learning-based planners lack explicit self-correction ability: once an unsafe action is proposed, there is no mechanism to correct it. Thus, we propose CorrectionPlanner, an autoregressive planner with self-correction that models planning as motion-token generation within a propose, evaluate, and correct loop. At each planning step, the policy proposes an action, namely a motion token, and a learned collision critic predicts whether it will induce a collision within a short horizon. If the critic predicts a collision, we retain the sequence of historical unsafe motion tokens as a self-correction trace, generate the next motion token conditioned on it, and repeat this process until a safe motion token is proposed or the safety criterion is met. This self-correction trace, consisting of all unsafe motion tokens, represents the planner's correction process in motion-token space, analogous to a reasoning trace in language models. We train the planner with imitation learning followed by model-based reinforcement learning using rollouts from a pretrained world model that realistically models agents' reactive behaviors. Closed-loop evaluations show that CorrectionPlanner reduces collision rate by over 20% on Waymax and achieves state-of-the-art planning scores on nuPlan.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes