DSO: Aligning 3D Generators with Simulation Feedback for Physical Soundness
This addresses the issue of physically unsound 3D objects for practical applications like robotics or manufacturing, representing a novel method for a known bottleneck rather than a foundational advance.
The paper tackles the problem of 3D object generators producing unstable objects by aligning them with simulation feedback for physical soundness, resulting in a fine-tuned generator that is significantly faster and more likely to produce stable objects than test-time optimization methods.
Most 3D object generators prioritize aesthetic quality, often neglecting the physical constraints necessary for practical applications. One such constraint is that a 3D object should be self-supporting, i.e., remain balanced under gravity. Previous approaches to generating stable 3D objects relied on differentiable physics simulators to optimize geometry at test time, which is slow, unstable, and prone to local optima. Inspired by the literature on aligning generative models with external feedback, we propose Direct Simulation Optimization (DSO). This framework leverages feedback from a (non-differentiable) simulator to increase the likelihood that the 3D generator directly outputs stable 3D objects. We construct a dataset of 3D objects labeled with stability scores obtained from the physics simulator. This dataset enables fine-tuning of the 3D generator using the stability score as an alignment metric, via direct preference optimization (DPO) or direct reward optimization (DRO) - a novel objective we introduce to align diffusion models without requiring pairwise preferences. Our experiments demonstrate that the fine-tuned feed-forward generator, using either the DPO or DRO objective, is significantly faster and more likely to produce stable objects than test-time optimization. Notably, the DSO framework functions even without any ground-truth 3D objects for training, allowing the 3D generator to self-improve by automatically collecting simulation feedback on its own outputs.