AIMMAug 30, 2025

Attention of a Kiss: Exploring Attention Maps in Video Diffusion for XAIxArts

arXiv:2509.05323v2h-index: 2Has Code
Originality Synthesis-oriented
AI Analysis

This work contributes to Explainable AI for the Arts (XAIxArts), offering artists tools to explore AI's inner workings as a creative medium, though it is incremental in applying existing visualization techniques to a new domain.

The paper investigates attention mechanisms in video diffusion transformers by developing a method to extract and visualize cross-attention maps, using the Wan model to provide interpretable insights into text-to-video generation for artistic applications.

This paper presents an artistic and technical investigation into the attention mechanisms of video diffusion transformers. Inspired by early video artists who manipulated analog video signals to create new visual aesthetics, this study proposes a method for extracting and visualizing cross-attention maps in generative video models. Built on the open-source Wan model, our tool provides an interpretable window into the temporal and spatial behavior of attention in text-to-video generation. Through exploratory probes and an artistic case study, we examine the potential of attention maps as both analytical tools and raw artistic material. This work contributes to the growing field of Explainable AI for the Arts (XAIxArts), inviting artists to reclaim the inner workings of AI as a creative medium.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes