CVJun 1

Thinking in Blender: Staged Executable Inverse Graphics with Vision-Language Models

arXiv:2606.0258060.5
AI Analysis

For computer vision and graphics researchers, this work demonstrates that general-purpose VLMs can perform inverse graphics without specialized training, though the improvements are incremental over existing methods.

This work investigates whether pretrained vision-language models can perform executable inverse graphics from a single image by reconstructing a scene as an editable Blender program, without specialized models or multi-view supervision. The proposed staged framework (SEIG) progressively refines geometry, materials, composition, and lighting, achieving improved reconstruction fidelity across pixel-level, perceptual, and semantic metrics.

Inverse graphics is a longstanding and highly underconstrained problem that seeks to reconstruct images as editable 3D scenes which can be rendered, relit, and manipulated. In this work, we investigate whether pretrained vision-language models (VLMs) can perform executable inverse graphics directly from a single image by reconstructing a scene as an editable Blender program, without relying on specialized 2D or 3D foundation models, differentiable rendering, or multi-view supervision. We introduce Staged Executable Inverse Graphics (SEIG), an agentic framework that reconstructs a 3D scene from a single image by progressively refining scene factors including geometry, materials, composition, and lighting directly in executable Blender code space. We evaluate our framework across diverse scenes using a range of reconstruction metrics spanning pixel-level, perceptual, and semantic fidelity. Our experiments show that staged reconstruction substantially improves reconstruction fidelity, highlighting the importance of task decomposition for executable inverse graphics with general-purpose VLMs. Finally, we showcase various downstream applications enabled by the reconstructed editable Blender scenes.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes