CLIPGaussian: Universal and Multimodal Style Transfer Based on Gaussian Splatting
This addresses the problem of multimodal style transfer for researchers and practitioners working with Gaussian Splatting representations, representing an incremental advancement by extending existing pipelines.
The paper tackles the challenge of applying style transfer to Gaussian Splatting-based representations beyond simple color changes, introducing CLIPGaussian as the first unified framework that supports text- and image-guided stylization across 2D images, videos, 3D objects, and 4D scenes, achieving superior style fidelity and consistency across all tasks.
Gaussian Splatting (GS) has recently emerged as an efficient representation for rendering 3D scenes from 2D images and has been extended to images, videos, and dynamic 4D content. However, applying style transfer to GS-based representations, especially beyond simple color changes, remains challenging. In this work, we introduce CLIPGaussian, the first unified style transfer framework that supports text- and image-guided stylization across multiple modalities: 2D images, videos, 3D objects, and 4D scenes. Our method operates directly on Gaussian primitives and integrates into existing GS pipelines as a plug-in module, without requiring large generative models or retraining from scratch. The CLIPGaussian approach enables joint optimization of color and geometry in 3D and 4D settings, and achieves temporal coherence in videos, while preserving the model size. We demonstrate superior style fidelity and consistency across all tasks, validating CLIPGaussian as a universal and efficient solution for multimodal style transfer.