CompoSE: Compositional Synthesis and Editing of 3D Shapes via Part-Aware Control
This work addresses the need for controllable 3D content creation by providing a part-aware method that allows localized editing without part-level text prompts, offering a practical tool for artists and designers.
CompoSE introduces a diffusion transformer for compositional 3D shape synthesis and editing from coarse part layouts, enabling part-level operations like substitution and resizing, and significantly outperforms existing methods in objective and LLM-based evaluations.
Creating and editing high-quality 3D content remains a central challenge in computer graphics. We address this challenge by introducing CompoSE, a novel method for Compositional Synthesis and Editing of 3D shapes via part-aware control. Our method takes as input a set of coarse geometric primitives (e.g., bounding boxes) that represent distinct object parts arranged in a particular spatial configuration, and synthesizes as output part-separated 3D objects that support localized granular (i.e., compositional) editing of individual parts. The key insight that enables our method is our use of a diffusion transformer architecture that alternates between processing each part locally and aggregating contextual information across parts globally, and features a novel conditioning technique that ensures strong adherence to the user's input. Importantly, our method learns to infer part semantics and symmetries directly from the user's coarse layout guidance, and does not require part-level text prompts. We demonstrate that our method enables powerful part-level editing capabilities, including context-aware substitution, addition, deletion, and style-preserving resizing operations. We show through extensive experiments that our method significantly outperforms existing approaches on guided synthesis, as measured by objective metrics and LLM-based evaluations.