CLIPtortionist: Zero-shot Text-driven Deformation for Manufactured 3D Shapes
This addresses the problem of intuitive 3D shape editing for designers and creators, though it appears incremental as it builds on existing CLIP and deformation techniques.
The authors tackled the problem of text-driven 3D shape deformation for manufactured objects by developing a zero-shot system that deforms 3D meshes to match text descriptions, using CLIP-based optimization and outperforming several baselines with appealing results.
We propose a zero-shot text-driven 3D shape deformation system that deforms an input 3D mesh of a manufactured object to fit an input text description. To do this, our system optimizes the parameters of a deformation model to maximize an objective function based on the widely used pre-trained vision language model CLIP. We find that CLIP-based objective functions exhibit many spurious local optima; to circumvent them, we parameterize deformations using a novel deformation model called BoxDefGraph which our system automatically computes from an input mesh, the BoxDefGraph is designed to capture the object aligned rectangular/circular geometry features of most manufactured objects. We then use the CMA-ES global optimization algorithm to maximize our objective, which we find to work better than popular gradient-based optimizers. We demonstrate that our approach produces appealing results and outperforms several baselines.