CVAIOct 11, 2025

From Programs to Poses: Factored Real-World Scene Generation via Learned Program Libraries

arXiv:2510.10292v14 citationsh-index: 6
Originality Incremental advance
AI Analysis

This addresses the challenge of realistic scene generation for applications in computer vision and robotics, though it appears incremental by building on existing methods like large language models.

The authors tackled the problem of generating realistic 3D scenes with varied object poses from limited real-world data like ScanNet, and their FactoredScenes framework produced rooms that were difficult to distinguish from real scenes.

Real-world scenes, such as those in ScanNet, are difficult to capture, with highly limited data available. Generating realistic scenes with varied object poses remains an open and challenging task. In this work, we propose FactoredScenes, a framework that synthesizes realistic 3D scenes by leveraging the underlying structure of rooms while learning the variation of object poses from lived-in scenes. We introduce a factored representation that decomposes scenes into hierarchically organized concepts of room programs and object poses. To encode structure, FactoredScenes learns a library of functions capturing reusable layout patterns from which scenes are drawn, then uses large language models to generate high-level programs, regularized by the learned library. To represent scene variations, FactoredScenes learns a program-conditioned model to hierarchically predict object poses, and retrieves and places 3D objects in a scene. We show that FactoredScenes generates realistic, real-world rooms that are difficult to distinguish from real ScanNet scenes.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes