CVNov 8, 2023

VioLA: Aligning Videos to 2D LiDAR Scans

arXiv:2311.04783v1h-index: 41
Originality Incremental advance
AI Analysis

This addresses the challenge of localizing video data in larger environments for applications like robotics or mapping, though it is incremental as it builds on existing methods for registration and scene completion.

The paper tackles the problem of aligning a video to a 2D LiDAR scan by introducing VioLA, which builds a semantic map from images and uses inpainting and depth completion to fill missing content, improving pose registration performance by up to 20% on real-world benchmarks.

We study the problem of aligning a video that captures a local portion of an environment to the 2D LiDAR scan of the entire environment. We introduce a method (VioLA) that starts with building a semantic map of the local scene from the image sequence, then extracts points at a fixed height for registering to the LiDAR map. Due to reconstruction errors or partial coverage of the camera scan, the reconstructed semantic map may not contain sufficient information for registration. To address this problem, VioLA makes use of a pre-trained text-to-image inpainting model paired with a depth completion model for filling in the missing scene content in a geometrically consistent fashion to support pose registration. We evaluate VioLA on two real-world RGB-D benchmarks, as well as a self-captured dataset of a large office scene. Notably, our proposed scene completion module improves the pose registration performance by up to 20%.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes