ScribbleSense: Generative Scribble-Based Texture Editing with Intent Prediction
This addresses a specific challenge in 3D asset creation for users needing intuitive texture editing, representing an incremental improvement over existing sketch-based methods.
The paper tackled the problem of ambiguous editing intentions and unclear target semantic locations in coarse-grained scribble-based texture editing for 3D models by proposing ScribbleSense, which uses multimodal large language models to predict intent and image generation models to extract local texture details, achieving state-of-the-art interactive editing performance.
Interactive 3D model texture editing presents enhanced opportunities for creating 3D assets, with freehand drawing style offering the most intuitive experience. However, existing methods primarily support sketch-based interactions for outlining, while the utilization of coarse-grained scribble-based interaction remains limited. Furthermore, current methodologies often encounter challenges due to the abstract nature of scribble instructions, which can result in ambiguous editing intentions and unclear target semantic locations. To address these issues, we propose ScribbleSense, an editing method that combines multimodal large language models (MLLMs) and image generation models to effectively resolve these challenges. We leverage the visual capabilities of MLLMs to predict the editing intent behind the scribbles. Once the semantic intent of the scribble is discerned, we employ globally generated images to extract local texture details, thereby anchoring local semantics and alleviating ambiguities concerning the target semantic locations. Experimental results indicate that our method effectively leverages the strengths of MLLMs, achieving state-of-the-art interactive editing performance for scribble-based texture editing.