CVMar 14, 2025

Human-in-the-Loop Local Corrections of 3D Scene Layouts via Infilling

arXiv:2503.11806v23 citationsh-index: 8
Originality Incremental advance
AI Analysis

This addresses the challenge of accurately modeling complex 3D scene layouts for applications like robotics or AR/VR, though it is incremental as it builds on an existing framework.

The paper tackles the problem of improving 3D scene layout estimation by introducing a human-in-the-loop approach where users identify local errors and prompt a model to correct them via infilling, resulting in a system that maintains global prediction performance while significantly enhancing local correction ability.

We present a novel human-in-the-loop approach to estimate 3D scene layout that uses human feedback from an egocentric standpoint. We study this approach through introduction of a novel local correction task, where users identify local errors and prompt a model to automatically correct them. Building on SceneScript, a state-of-the-art framework for 3D scene layout estimation that leverages structured language, we propose a solution that structures this problem as "infilling", a task studied in natural language processing. We train a multi-task version of SceneScript that maintains performance on global predictions while significantly improving its local correction ability. We integrate this into a human-in-the-loop system, enabling a user to iteratively refine scene layout estimates via a low-friction "one-click fix'' workflow. Our system enables the final refined layout to diverge from the training distribution, allowing for more accurate modelling of complex layouts.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes