ROCVMay 31

ActMVS: Active Scene Reconstruction with Monocular Multi-View Stereo

arXiv:2606.0136777.8Has Code
AI Analysis

This work addresses the need for low-cost, lightweight active scene reconstruction for robots/UAVs by replacing depth sensors with a monocular camera, though it is incremental as it combines existing techniques.

ActMVS introduces the first monocular active reconstruction framework, enabling robots/UAVs to autonomously plan trajectories and reconstruct environments using only a monocular camera. It achieves performance competitive with RGB-D methods on Replica datasets.

Active scene reconstruction enables robots/UAVs to autonomously plan trajectories and reconstruct environments without costly manual data acquisition. Unlike passive methods, active reconstruction requires real-time construction of high-confidence occupancy maps for collision-free navigation. Existing approaches rely on depth sensors for occupancy map updates, increasing platform cost and weight. To advance spatial intelligence, we aim for a vision-only monocular solution. However, current monocular scene reconstruction methods operate offline and fail to deliver globally consistent dense depth at the frame rates required for robots/UAVs navigation. To bridge this gap, we introduce ActMVS, the first framework for monocular active reconstruction. Our framework integrates a view factor graph construction for informed Multi-View Stereo depth prediction, along with a global depth optimization, to enable the online generation of high-quality, globally consistent dense depth maps. This enables monocular robots/UAVs to maintain reliable occupancy maps for safe trajectory planning during reconstruction. Experiments on Replica datasets demonstrate performance competitive with RGB-D methods. Our code and data are available at https://github.com/TrickyGo/ActMVS.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes