IntroStyle: Training-Free Introspective Style Attribution using Diffusion Features
This work addresses the need for efficient, real-time style attribution to protect intellectual property rights in text-to-image generation, offering a practical solution without the resource-intensive training of existing methods.
The authors tackled the problem of style attribution in text-to-image models by introducing IntroStyle, a training-free framework that uses diffusion model features to attribute artistic styles without requiring custom datasets or model training, achieving superior performance over state-of-the-art methods with wide margins on datasets like WikiArt and DomainNet.
Text-to-image (T2I) models have recently gained widespread adoption. This has spurred concerns about safeguarding intellectual property rights and an increasing demand for mechanisms that prevent the generation of specific artistic styles. Existing methods for style extraction typically necessitate the collection of custom datasets and the training of specialized models. This, however, is resource-intensive, time-consuming, and often impractical for real-time applications. We present a novel, training-free framework to solve the style attribution problem, using the features produced by a diffusion model alone, without any external modules or retraining. This is denoted as Introspective Style attribution (IntroStyle) and is shown to have superior performance to state-of-the-art models for style attribution. We also introduce a synthetic dataset of Artistic Style Split (ArtSplit) to isolate artistic style and evaluate fine-grained style attribution performance. Our experimental results on WikiArt and DomainNet datasets show that \ours is robust to the dynamic nature of artistic styles, outperforming existing methods by a wide margin.