CVSep 20, 2025

RLGF: Reinforcement Learning with Geometric Feedback for Autonomous Driving Video Generation

arXiv:2509.16500v27 citationsh-index: 10
Originality Highly original
AI Analysis

This addresses the need for reliable synthetic data for autonomous driving development, offering a plug-and-play solution to narrow the performance gap with real data.

The paper tackled the problem of geometric distortions in synthetic autonomous driving videos, which limit their utility for perception tasks, and introduced RLGF to refine video diffusion models, resulting in a 12.7% improvement in 3D object detection mAP and reductions in geometric errors like depth error by 57%.

Synthetic data is crucial for advancing autonomous driving (AD) systems, yet current state-of-the-art video generation models, despite their visual realism, suffer from subtle geometric distortions that limit their utility for downstream perception tasks. We identify and quantify this critical issue, demonstrating a significant performance gap in 3D object detection when using synthetic versus real data. To address this, we introduce Reinforcement Learning with Geometric Feedback (RLGF), RLGF uniquely refines video diffusion models by incorporating rewards from specialized latent-space AD perception models. Its core components include an efficient Latent-Space Windowing Optimization technique for targeted feedback during diffusion, and a Hierarchical Geometric Reward (HGR) system providing multi-level rewards for point-line-plane alignment, and scene occupancy coherence. To quantify these distortions, we propose GeoScores. Applied to models like DiVE on nuScenes, RLGF substantially reduces geometric errors (e.g., VP error by 21\%, Depth error by 57\%) and dramatically improves 3D object detection mAP by 12.7\%, narrowing the gap to real-data performance. RLGF offers a plug-and-play solution for generating geometrically sound and reliable synthetic videos for AD development.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes