GaussianBlender: Instant Stylization of 3D Gaussians with Disentangled Latent Spaces
This addresses the need for scalable and practical 3D stylization in game development and digital arts, offering a novel approach that is not incremental.
The paper tackles the problem of slow and inconsistent 3D stylization by introducing GaussianBlender, a feed-forward framework that enables instant, high-fidelity text-driven edits with multi-view consistency, surpassing optimization-based methods in speed and quality.
3D stylization is central to game development, virtual reality, and digital arts, where the demand for diverse assets calls for scalable methods that support fast, high-fidelity manipulation. Existing text-to-3D stylization methods typically distill from 2D image editors, requiring time-intensive per-asset optimization and exhibiting multi-view inconsistency due to the limitations of current text-to-image models, which makes them impractical for large-scale production. In this paper, we introduce GaussianBlender, a pioneering feed-forward framework for text-driven 3D stylization that performs edits instantly at inference. Our method learns structured, disentangled latent spaces with controlled information sharing for geometry and appearance from spatially-grouped 3D Gaussians. A latent diffusion model then applies text-conditioned edits on these learned representations. Comprehensive evaluations show that GaussianBlender not only delivers instant, high-fidelity, geometry-preserving, multi-view consistent stylization, but also surpasses methods that require per-instance test-time optimization - unlocking practical, democratized 3D stylization at scale.