CVNov 20, 2025

YOWO: You Only Walk Once to Jointly Map An Indoor Scene and Register Ceiling-mounted Cameras

Fan Yang, Sosuke Yamao, Ikuo Kusajima, Atsunori Moteki, Shoichi Masui, Shan Jiang

arXiv:2511.16521v110.23 citationsh-index: 5IEEE transactions on circuits and systems for video technology (Print)

Originality Incremental advance

AI Analysis

This work addresses the inefficiency and cost of manual registration for ceiling-mounted cameras in indoor environments, offering an automated solution for applications like surveillance or robotics, though it is incremental as it builds on existing visual localization and mapping techniques.

The paper tackles the problem of automatically registering ceiling-mounted cameras to indoor scene layouts, which is challenging due to visual ambiguity, by proposing a method that uses a mobile agent with an RGB-D camera to jointly map the scene and register cameras in a single walk. The result is a unified framework that enhances performance for both tasks, as validated on a new benchmark dataset.

Using ceiling-mounted cameras (CMCs) for indoor visual capturing opens up a wide range of applications. However, registering CMCs to the target scene layout presents a challenging task. While manual registration with specialized tools is inefficient and costly, automatic registration with visual localization may yield poor results when visual ambiguity exists. To alleviate these issues, we propose a novel solution for jointly mapping an indoor scene and registering CMCs to the scene layout. Our approach involves equipping a mobile agent with a head-mounted RGB-D camera to traverse the entire scene once and synchronize CMCs to capture this mobile agent. The egocentric videos generate world-coordinate agent trajectories and the scene layout, while the videos of CMCs provide pseudo-scale agent trajectories and CMC relative poses. By correlating all the trajectories with their corresponding timestamps, the CMC relative poses can be aligned to the world-coordinate scene layout. Based on this initialization, a factor graph is customized to enable the joint optimization of ego-camera poses, scene layout, and CMC poses. We also develop a new dataset, setting the first benchmark for collaborative scene mapping and CMC registration (https://sites.google.com/view/yowo/home). Experimental results indicate that our method not only effectively accomplishes two tasks within a unified framework, but also jointly enhances their performance. We thus provide a reliable tool to facilitate downstream position-aware applications.

View on arXiv PDF

Similar