PlaceIt3D: Language-Guided Object Placement in Real 3D Scenes
This work addresses a new problem in 3D scene understanding for AI and robotics, but it is incremental as it establishes initial benchmarks and baselines rather than achieving state-of-the-art results.
The paper tackles the novel task of language-guided object placement in real 3D scenes, where a model must find valid placements for 3D assets based on textual prompts, and introduces a new benchmark, dataset, and baseline method to address its challenges of ambiguity and geometric reasoning.
We introduce the novel task of Language-Guided Object Placement in Real 3D Scenes. Our model is given a 3D scene's point cloud, a 3D asset, and a textual prompt broadly describing where the 3D asset should be placed. The task here is to find a valid placement for the 3D asset that respects the prompt. Compared with other language-guided localization tasks in 3D scenes such as grounding, this task has specific challenges: it is ambiguous because it has multiple valid solutions, and it requires reasoning about 3D geometric relationships and free space. We inaugurate this task by proposing a new benchmark and evaluation protocol. We also introduce a new dataset for training 3D LLMs on this task, as well as the first method to serve as a non-trivial baseline. We believe that this challenging task and our new benchmark could become part of the suite of benchmarks used to evaluate and compare generalist 3D LLM models.