CVIVJul 10, 2025

Hardware-Aware Feature Extraction Quantisation for Real-Time Visual Odometry on FPGA Platforms

arXiv:2507.07903v1h-index: 5DSD
Originality Incremental advance
AI Analysis

This work addresses the need for real-time, resource-efficient visual odometry in mobile and embedded systems like drones and vehicles, though it is incremental as it builds on existing quantisation and hardware optimisation methods.

The authors tackled the problem of efficient visual odometry for autonomous platforms by proposing a quantised SuperPoint CNN for feature extraction, achieving 54 fps on 640x480 images on an FPGA, outperforming state-of-the-art solutions.

Accurate position estimation is essential for modern navigation systems deployed in autonomous platforms, including ground vehicles, marine vessels, and aerial drones. In this context, Visual Simultaneous Localisation and Mapping (VSLAM) - which includes Visual Odometry - relies heavily on the reliable extraction of salient feature points from the visual input data. In this work, we propose an embedded implementation of an unsupervised architecture capable of detecting and describing feature points. It is based on a quantised SuperPoint convolutional neural network. Our objective is to minimise the computational demands of the model while preserving high detection quality, thus facilitating efficient deployment on platforms with limited resources, such as mobile or embedded systems. We implemented the solution on an FPGA System-on-Chip (SoC) platform, specifically the AMD/Xilinx Zynq UltraScale+, where we evaluated the performance of Deep Learning Processing Units (DPUs) and we also used the Brevitas library and the FINN framework to perform model quantisation and hardware-aware optimisation. This allowed us to process 640 x 480 pixel images at up to 54 fps on an FPGA platform, outperforming state-of-the-art solutions in the field. We conducted experiments on the TUM dataset to demonstrate and discuss the impact of different quantisation techniques on the accuracy and performance of the model in a visual odometry task.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes