ROJun 2

BEV-ODOM2: Enhanced BEV-based Monocular Visual Odometry with PV-BEV Fusion and Dense Flow Supervision for Ground Robots

arXiv:2509.1463655.3
AI Analysis

This work addresses scale drift and information loss in BEV-based visual odometry for autonomous ground robots, offering a practical improvement with real-time edge deployment capability.

BEV-ODOM2 enhances monocular visual odometry for ground robots by introducing dense BEV optical flow supervision and PV-BEV fusion, achieving a 40% improvement in relative translation error over prior BEV-based methods across multiple datasets.

Scale-consistent ego-motion estimation is fundamental for autonomous ground robots. Bird's-Eye-View (BEV) representation naturally addresses the scale drift problem of monocular visual odometry (MVO) by providing a metric-scaled planar workspace, enabling the simplification of 6-DoF ego-motion to a more robust 3-DoF model. However, existing BEV-based methods suffer from two key limitations: sparse supervision signals from pose-only training, and information loss during perspective-to-BEV projection. We present BEV-ODOM2, an enhanced framework that addresses both limitations without requiring additional annotations. Our approach introduces (1) dense BEV optical flow supervision constructed directly from 3-DoF pose ground truth for pixel-level guidance, and (2) Perspective View (PV)-BEV fusion that computes correlation volumes before projection to preserve 6-DoF motion cues. An enhanced rotation sampling strategy further balances diverse motion patterns during training. We evaluate on four datasets with varied spatial scales: KITTI, Oxford, NCLT, and our newly collected ZJH-VO benchmark. BEV-ODOM2 achieves a 40\% RTE improvement over prior BEV-based methods, with real-time inference on an NVIDIA Jetson AGX Orin confirming edge deployment feasibility. The source code and the ZJH-VO dataset are publicly released to facilitate future research.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes