CVRODec 11, 2025

Latent Chain-of-Thought World Modeling for End-to-End Driving

arXiv:2512.10226v16 citationsh-index: 31
Originality Incremental advance
AI Analysis

This addresses the challenge of improving driving performance and safety in autonomous vehicles, though it is an incremental advancement over existing reasoning methods.

The paper tackles the problem of inefficient text-based reasoning in autonomous driving by introducing Latent-CoT-Drive, which uses a latent language for chain-of-thought reasoning, resulting in faster inference and better trajectory quality on a large-scale benchmark.

Recent Vision-Language-Action (VLA) models for autonomous driving explore inference-time reasoning as a way to improve driving performance and safety in challenging scenarios. Most prior work uses natural language to express chain-of-thought (CoT) reasoning before producing driving actions. However, text may not be the most efficient representation for reasoning. In this work, we present Latent-CoT-Drive (LCDrive): a model that expresses CoT in a latent language that captures possible outcomes of the driving actions being considered. Our approach unifies CoT reasoning and decision making by representing both in an action-aligned latent space. Instead of natural language, the model reasons by interleaving (1) action-proposal tokens, which use the same vocabulary as the model's output actions; and (2) world model tokens, which are grounded in a learned latent world model and express future outcomes of these actions. We cold start latent CoT by supervising the model's action proposals and world model tokens based on ground-truth future rollouts of the scene. We then post-train with closed-loop reinforcement learning to strengthen reasoning capabilities. On a large-scale end-to-end driving benchmark, LCDrive achieves faster inference, better trajectory quality, and larger improvements from interactive reinforcement learning compared to both non-reasoning and text-reasoning baselines.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes