LADIS: Language Disentanglement for 3D Shape Editing
This work addresses the challenge of democratizing 3D shape design through natural language interaction, offering an incremental improvement in edit locality for 3D shape editing.
The paper tackles the problem of producing decoupled, local edits to 3D shapes using natural language, and shows that their method outperforms existing SOTA methods by 20% in edit locality and up to 6.6% in language reference resolution accuracy.
Natural language interaction is a promising direction for democratizing 3D shape design. However, existing methods for text-driven 3D shape editing face challenges in producing decoupled, local edits to 3D shapes. We address this problem by learning disentangled latent representations that ground language in 3D geometry. To this end, we propose a complementary tool set including a novel network architecture, a disentanglement loss, and a new editing procedure. Additionally, to measure edit locality, we define a new metric that we call part-wise edit precision. We show that our method outperforms existing SOTA methods by 20% in terms of edit locality, and up to 6.6% in terms of language reference resolution accuracy. Our work suggests that by solely disentangling language representations, downstream 3D shape editing can become more local to relevant parts, even if the model was never given explicit part-based supervision.