CVAILGROJul 23, 2025

PRIX: Learning to Plan from Raw Pixels for End-to-End Autonomous Driving

arXiv:2507.17596v28 citationsh-index: 54Has Code
Originality Incremental advance
AI Analysis

This addresses scalability issues for mass-market autonomous vehicles with camera-only setups, though it appears incremental as it builds on existing end-to-end planning methods.

The paper tackles the challenge of deploying end-to-end autonomous driving models by proposing PRIX, an efficient architecture that uses only camera data without LiDAR or BEV representations, achieving state-of-the-art performance on NavSim and nuScenes benchmarks while being significantly more efficient in inference speed and model size.

While end-to-end autonomous driving models show promising results, their practical deployment is often hindered by large model sizes, a reliance on expensive LiDAR sensors and computationally intensive BEV feature representations. This limits their scalability, especially for mass-market vehicles equipped only with cameras. To address these challenges, we propose PRIX (Plan from Raw Pixels). Our novel and efficient end-to-end driving architecture operates using only camera data, without explicit BEV representation and forgoing the need for LiDAR. PRIX leverages a visual feature extractor coupled with a generative planning head to predict safe trajectories from raw pixel inputs directly. A core component of our architecture is the Context-aware Recalibration Transformer (CaRT), a novel module designed to effectively enhance multi-level visual features for more robust planning. We demonstrate through comprehensive experiments that PRIX achieves state-of-the-art performance on the NavSim and nuScenes benchmarks, matching the capabilities of larger, multimodal diffusion planners while being significantly more efficient in terms of inference speed and model size, making it a practical solution for real-world deployment. Our work is open-source and the code will be at https://maxiuw.github.io/prix.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes