CVNov 26, 2025

PAT3D: Physics-Augmented Text-to-3D Scene Generation

arXiv:2511.21978v16 citations
Originality Highly original
AI Analysis

This addresses the challenge of creating simulation-ready, intersection-free 3D scenes for applications like scene editing and robotic manipulation, representing a novel integration rather than an incremental step.

The paper tackles the problem of generating physically plausible 3D scenes from text prompts by introducing PAT3D, a framework that integrates vision-language models with physics-based simulation, resulting in substantial improvements in physical plausibility, semantic consistency, and visual quality over prior approaches.

We introduce PAT3D, the first physics-augmented text-to-3D scene generation framework that integrates vision-language models with physics-based simulation to produce physically plausible, simulation-ready, and intersection-free 3D scenes. Given a text prompt, PAT3D generates 3D objects, infers their spatial relations, and organizes them into a hierarchical scene tree, which is then converted into initial conditions for simulation. A differentiable rigid-body simulator ensures realistic object interactions under gravity, driving the scene toward static equilibrium without interpenetrations. To further enhance scene quality, we introduce a simulation-in-the-loop optimization procedure that guarantees physical stability and non-intersection, while improving semantic consistency with the input prompt. Experiments demonstrate that PAT3D substantially outperforms prior approaches in physical plausibility, semantic consistency, and visual quality. Beyond high-quality generation, PAT3D uniquely enables simulation-ready 3D scenes for downstream tasks such as scene editing and robotic manipulation. Code and data will be released upon acceptance.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes