ROMar 17

Dual-Agent Reinforcement Learning for Adaptive and Cost-Aware Visual-Inertial Odometry

arXiv:2511.2108335.7h-index: 3
AI Analysis

This work addresses the problem of enabling efficient and accurate ego-motion estimation for robotics and augmented reality on resource-constrained platforms, representing an incremental improvement over existing methods.

The paper tackles the trade-off between accuracy and computational cost in Visual-Inertial Odometry (VIO) by proposing a dual-agent reinforcement learning framework that adaptively controls when to run visual processing and how to fuse sensor data, achieving the best average ATE while running up to 1.77 times faster and using less GPU memory than prior GPU-based systems.

Visual-Inertial Odometry (VIO) is a critical component for robust ego-motion estimation, enabling foundational capabilities such as autonomous navigation in robotics and real-time 6-DoF tracking for augmented reality. Existing methods face a well-known trade-off: filter-based approaches are efficient but prone to drift, while optimization-based methods, though accurate, rely on computationally prohibitive Visual-Inertial Bundle Adjustment (VIBA) that is difficult to run on resource-constrained platforms. Rather than removing VIBA altogether, we aim to reduce how often and how heavily it must be invoked. To this end, we cast two key design choices in modern VIO, when to run the visual frontend and how strongly to trust its output, as sequential decision problems, and solve them with lightweight reinforcement learning (RL) agents. Our framework introduces a lightweight, dual-pronged RL policy that serves as our core contribution: (1) a Select Agent intelligently gates the entire VO pipeline based only on high-frequency IMU data; and (2) a composite Fusion Agent that first estimates a robust velocity state via a supervised network, before an RL policy adaptively fuses the full (p, v, q) state. Experiments on the EuRoC MAV and TUM-VI datasets show that, in our unified evaluation, the proposed method achieves a more favorable accuracy-efficiency-memory trade-off than prior GPU-based VO/VIO systems: it attains the best average ATE while running up to 1.77 times faster and using less GPU memory. Compared to classical optimization-based VIO systems, our approach maintains competitive trajectory accuracy while substantially reducing computational load.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes