CVROJul 16, 2019

Real-time Vision-based Depth Reconstruction with NVidia Jetson

arXiv:1907.07210v117 citationsHas Code
Originality Synthesis-oriented
AI Analysis

This work addresses the need for efficient depth estimation in mobile robotics, but it is incremental as it builds on existing FCNN methods with optimizations for specific hardware.

The authors tackled real-time depth reconstruction from single images for mobile robotics by experimenting with FCNN architectures and introducing enhancements to improve inference efficiency, achieving over 16FPS on NVidia Jetson for 320x240 input and enabling real-time monocular vSLAM.

Vision-based depth reconstruction is a challenging problem extensively studied in computer vision but still lacking universal solution. Reconstructing depth from single image is particularly valuable to mobile robotics as it can be embedded to the modern vision-based simultaneous localization and mapping (vSLAM) methods providing them with the metric information needed to construct accurate maps in real scale. Typically, depth reconstruction is done nowadays via fully-convolutional neural networks (FCNNs). In this work we experiment with several FCNN architectures and introduce a few enhancements aimed at increasing both the effectiveness and the efficiency of the inference. We experimentally determine the solution that provides the best performance/accuracy tradeoff and is able to run on NVidia Jetson with the framerates exceeding 16FPS for 320 x 240 input. We also evaluate the suggested models by conducting monocular vSLAM of unknown indoor environment on NVidia Jetson TX2 in real-time. Open-source implementation of the models and the inference node for Robot Operating System (ROS) are available at https://github.com/CnnDepth/tx2_fcnn_node.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes