AttnMod: Attention-Based New Art Styles
This addresses the need for more expressive and controllable art style generation in AI image synthesis, though it is incremental as it builds on existing diffusion models.
The authors tackled the problem of generating novel art styles in text-to-image diffusion models without retraining, by introducing AttnMod, a training-free technique that modulates cross-attention during denoising to enable diverse stylistic transformations.
We introduce AttnMod, a training-free technique that modulates cross-attention in pre-trained diffusion models to generate novel, unpromptable art styles. The method is inspired by how a human artist might reinterpret a generated image, for example by emphasizing certain features, dispersing color, twisting silhouettes, or materializing unseen elements. AttnMod simulates this intent by altering how the text prompt conditions the image through attention during denoising. These targeted modulations enable diverse stylistic transformations without changing the prompt or retraining the model, and they expand the expressive capacity of text-to-image generation.