CVROIVSep 12, 2025

Efficient and Accurate Downfacing Visual Inertial Odometry

arXiv:2509.10021v11 citationsh-index: 10IEEE Internet of Things Journal
Originality Synthesis-oriented
AI Analysis

This work addresses the need for lightweight VIO on resource-constrained drones, though it is incremental as it builds on existing methods.

The paper tackles the problem of implementing efficient and accurate visual inertial odometry (VIO) for micro- and nano-UAVs by optimizing and quantizing feature detection and tracking methods for low-power systems-on-chips, achieving up to a 3.65x reduction in RMSE over a baseline pipeline.

Visual Inertial Odometry (VIO) is a widely used computer vision method that determines an agent's movement through a camera and an IMU sensor. This paper presents an efficient and accurate VIO pipeline optimized for applications on micro- and nano-UAVs. The proposed design incorporates state-of-the-art feature detection and tracking methods (SuperPoint, PX4FLOW, ORB), all optimized and quantized for emerging RISC-V-based ultra-low-power parallel systems on chips (SoCs). Furthermore, by employing a rigid body motion model, the pipeline reduces estimation errors and achieves improved accuracy in planar motion scenarios. The pipeline's suitability for real-time VIO is assessed on an ultra-low-power SoC in terms of compute requirements and tracking accuracy after quantization. The pipeline, including the three feature tracking methods, was implemented on the SoC for real-world validation. This design bridges the gap between high-accuracy VIO pipelines that are traditionally run on computationally powerful systems and lightweight implementations suitable for microcontrollers. The optimized pipeline on the GAP9 low-power SoC demonstrates an average reduction in RMSE of up to a factor of 3.65x over the baseline pipeline when using the ORB feature tracker. The analysis of the computational complexity of the feature trackers further shows that PX4FLOW achieves on-par tracking accuracy with ORB at a lower runtime for movement speeds below 24 pixels/frame.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes