MMApr 22
AttentionBender: Manipulating Cross-Attention in Video Diffusion Transformers as a Creative ProbeAdam Cole, Mick Grierson
We present AttentionBender, a tool that manipulates cross-attention in Video Diffusion Transformers to help artists probe the internal mechanics of black-box video generation. While generative outputs are increasingly realistic, prompt-only control limits artists' ability to build intuition for the model's material process or to work beyond its default tendencies. Using an autobiographical research-through-design approach, we built on Network Bending to design AttentionBender, which applies 2D transforms (rotation, scaling, translation, etc.) to cross-attention maps to modulate generation. We assess AttentionBender by visualizing 4,500+ video generations across prompts, operations, and layer targets. Our results suggest that cross-attention is highly entangled: targeted manipulations often resist clean, localized control, producing distributed distortions and glitch aesthetics over linear edits. AttentionBender contributes a tool that functions both as an Explainable AI style probe of transformer attention mechanisms, and as a creative technique for producing novel aesthetics beyond the model's learned representational space.
AIAug 30, 2025Code
Attention of a Kiss: Exploring Attention Maps in Video Diffusion for XAIxArtsAdam Cole, Mick Grierson
This paper presents an artistic and technical investigation into the attention mechanisms of video diffusion transformers. Inspired by early video artists who manipulated analog video signals to create new visual aesthetics, this study proposes a method for extracting and visualizing cross-attention maps in generative video models. Built on the open-source Wan model, our tool provides an interpretable window into the temporal and spatial behavior of attention in text-to-video generation. Through exploratory probes and an artistic case study, we examine the potential of attention maps as both analytical tools and raw artistic material. This work contributes to the growing field of Explainable AI for the Arts (XAIxArts), inviting artists to reclaim the inner workings of AI as a creative medium.
HCAug 25, 2023
Antagonising explanation and revealing bias directly through sequencing and multimodal inferenceLuís Arandas, Mick Grierson, Miguel Carvalhais
Deep generative models produce data according to a learned representation, e.g. diffusion models, through a process of approximation computing possible samples. Approximation can be understood as reconstruction and the large datasets used to train models as sets of records in which we represent the physical world with some data structure (photographs, audio recordings, manuscripts). During the process of reconstruction, e.g., image frames develop each timestep towards a textual input description. While moving forward in time, frame sets are shaped according to learned bias and their production, we argue here, can be considered as going back in time; not by inspiration on the backward diffusion process but acknowledging culture is specifically marked in the records. Futures of generative modelling, namely in film and audiovisual arts, can benefit by dealing with diffusion systems as a process to compute the future by inevitably being tied to the past, if acknowledging the records as to capture fields of view at a specific time, and to correlate with our own finite memory ideals. Models generating new data distributions can target video production as signal processors and by developing sequences through timelines we ourselves also go back to decade-old algorithmic and multi-track methodologies revealing the actual predictive failure of contemporary approaches to synthesis in moving image, both as relevant to composition and not explanatory.
CVSep 4, 2024
Coral Model Generation from Single Images for Virtual Reality ApplicationsJie Fu, Shun Fu, Mick Grierson
With the rapid development of VR technology, the demand for high-quality 3D models is increasing. Traditional methods struggle with efficiency and quality in large-scale customization. This paper introduces a deep-learning framework that generates high-precision 3D coral models from a single image. Using the Coral dataset, the framework extracts geometric and texture features, performs 3D reconstruction, and optimizes design and material blending. Advanced optimization and polygon count control ensure shape accuracy, detail retention, and flexible output for various complexities, catering to high-quality rendering and real-time interaction needs.The project incorporates Explainable AI (XAI) to transform AI-generated models into interactive "artworks," best viewed in VR and XR. This enhances model interpretability and human-machine collaboration. Real-time feedback in VR interactions displays information like coral species and habitat, enriching user experience. The generated models surpass traditional methods in detail, visual quality, and efficiency. This research offers an intelligent approach to 3D content creation for VR, lowering production barriers, and promoting widespread VR applications. Additionally, integrating XAI provides new insights into AI-generated visual content and advances research in 3D vision interpretability.
CYFeb 5, 2024
Avoiding an AI-imposed Taylor's Version of all music historyNick Collins, Mick Grierson
As future musical AIs adhere closely to human music, they may form their own attachments to particular human artists in their databases, and these biases may in the worst case lead to potential existential threats to all musical history. AI super fans may act to corrupt the historical record and extant recordings in favour of their own preferences, and preservation of the diversity of world music culture may become even more of a pressing issue than the imposition of 12 tone equal temperament or other Western homogenisations. We discuss the technical capability of AI cover software and produce Taylor's Versions of famous tracks from Western pop history as provocative examples; the quality of these productions does not affect the overall argument (which might even see a future AI try to impose the sound of paperclips onto all existing audio files, let alone Taylor Swift). We discuss some potential defenses against the danger of future musical monopolies, whilst analysing the feasibility of a maximal 'Taylor Swiftication' of the complete musical record.
CLMay 30, 2025
Guiding Generative Storytelling with Knowledge GraphsZhijun Pan, Antonios Andronis, Eva Hayek et al.
Large Language Models (LLMs) have shown great potential in automated story generation, but challenges remain in maintaining long-form coherence and providing users with intuitive and effective control. Retrieval-Augmented Generation (RAG) has proven effective in reducing hallucinations in text generation; however, the use of structured data to support generative storytelling remains underexplored. This paper investigates how knowledge graphs (KGs) can enhance LLM-based storytelling by improving narrative quality and enabling user-driven modifications. We propose a KG-assisted storytelling pipeline and evaluate its effectiveness through a user study with 15 participants. Participants created their own story prompts, generated stories, and edited knowledge graphs to shape their narratives. Through quantitative and qualitative analysis, our findings demonstrate that knowledge graphs significantly enhance story quality in action-oriented and structured narratives within our system settings. Additionally, editing the knowledge graph increases users' sense of control, making storytelling more engaging, interactive, and playful.
LGJul 12, 2021
Active Divergence with Generative Deep Learning -- A Survey and TaxonomyTerence Broad, Sebastian Berns, Simon Colton et al.
Generative deep learning systems offer powerful tools for artefact generation, given their ability to model distributions of data and generate high-fidelity results. In the context of computational creativity, however, a major shortcoming is that they are unable to explicitly diverge from the training data in creative ways and are limited to fitting the target data distribution. To address these limitations, there have been a growing number of approaches for optimising, hacking and rewriting these models in order to actively diverge from the training data. We present a taxonomy and comprehensive survey of the state of the art of active divergence techniques, highlighting the potential for computational creativity researchers to advance these methods and use deep generative models in truly creative systems.
AIJul 9, 2020
Guru, Partner, or Pencil Sharpener? Understanding Designers' Attitudes Towards Intelligent Creativity Support ToolsAngus Main, Mick Grierson
Creativity Support Tools (CST) aim to enhance human creativity, but the deeply personal and subjective nature of creativity makes the design of universal support tools challenging. Individuals develop personal approaches to creativity, particularly in the context of commercial design where signature styles and techniques are valuable commodities. Artificial Intelligence (AI) and Machine Learning (ML) techniques could provide a means of creating 'intelligent' CST which learn and adapt to personal styles of creativity. Identifying what kind of role such tools could play in the design process requires a better understanding of designers' attitudes towards working with AI, and their willingness to include it in their personal creative process. This paper details the results of a survey of professional designers which indicates a positive and pragmatic attitude towards collaborating with AI tools, and a particular opportunity for incorporating them in the research stages of a design project.
CVMay 25, 2020
Network Bending: Expressive Manipulation of Deep Generative ModelsTerence Broad, Frederic Fol Leymarie, Mick Grierson
We introduce a new framework for manipulating and interacting with deep generative models that we call network bending. We present a comprehensive set of deterministic transformations that can be inserted as distinct layers into the computational graph of a trained generative neural network and applied during inference. In addition, we present a novel algorithm for analysing the deep generative model and clustering features based on their spatial activation maps. This allows features to be grouped together based on spatial similarity in an unsupervised fashion. This results in the meaningful manipulation of sets of features that correspond to the generation of a broad array of semantically significant features of the generated images. We outline this framework, demonstrating our results on state-of-the-art deep generative models trained on several image datasets. We show how it allows for the direct manipulation of semantically meaningful aspects of the generative process as well as allowing for a broad range of expressive outcomes.
CVFeb 28, 2020
Learning to See: You Are What You SeeMemo Akten, Rebecca Fiebrink, Mick Grierson
The authors present a visual instrument developed as part of the creation of the artwork Learning to See. The artwork explores bias in artificial neural networks and provides mechanisms for the manipulation of specifically trained for real-world representations. The exploration of these representations acts as a metaphor for the process of developing a visual understanding and/or visual vocabulary of the world. These representations can be explored and manipulated in real time, and have been produced in such a way so as to reflect specific creative perspectives that call into question the relationship between how both artificial neural networks and humans may construct meaning.
CVFeb 27, 2020
Deep Meditations: Controlled navigation of latent spaceMemo Akten, Rebecca Fiebrink, Mick Grierson
We introduce a method which allows users to creatively explore and navigate the vast latent spaces of deep generative models. Specifically, our method enables users to \textit{discover} and \textit{design} \textit{trajectories} in these high dimensional spaces, to construct stories, and produce time-based media such as videos---\textit{with meaningful control over narrative}. Our goal is to encourage and aid the use of deep generative models as a medium for creative expression and story telling with meaningful human control. Our method is analogous to traditional video production pipelines in that we use a conventional non-linear video editor with proxy clips, and conform with arrays of latent space vectors. Examples can be seen at \url{http://deepmeditations.ai}.
CVFeb 17, 2020
Amplifying The UncannyTerence Broad, Frederic Fol Leymarie, Mick Grierson
Deep neural networks have become remarkably good at producing realistic deepfakes, images of people that (to the untrained eye) are indistinguishable from real images. Deepfakes are produced by algorithms that learn to distinguish between real and fake images and are optimised to generate samples that the system deems realistic. This paper, and the resulting series of artworks Being Foiled explore the aesthetic outcome of inverting this process, instead optimising the system to generate images that it predicts as being fake. This maximises the unlikelihood of the data and in turn, amplifies the uncanny nature of these machine hallucinations.
LGOct 6, 2019
Transforming the output of GANs by fine-tuning them with features from different datasetsTerence Broad, Mick Grierson
In this work we present a method for fine-tuning pre-trained GANs with features from different datasets, resulting in the transformation of the output distribution into a new distribution with novel characteristics. The weights of the generator are updated using the weighted sum of the losses from a cross-dataset classifier and the frozen weights of the pre-trained discriminator. We discuss details of the technical implementation and share some of the visual results from this training process.
LGOct 6, 2019
Searching for an (un)stable equilibrium: experiments in training generative models without dataTerence Broad, Mick Grierson
This paper details a developing artistic practice around an ongoing series of works called (un)stable equilibrium. These works are the product of using modern machine toolkits to train generative models without data, an approach akin to traditional generative art where dynamical systems are explored intuitively for their latent generative possibilities. We discuss some of the guiding principles that have been learnt in the process of experimentation, present details of the implementation of the first series of works and discuss possibilities for future experimentation.
CVSep 24, 2017
Calligraphic Stylisation Learning with a Physiologically Plausible Model of Movement and Recurrent Neural NetworksDaniel Berio, Memo Akten, Frederic Fol Leymarie et al.
We propose a computational framework to learn stylisation patterns from example drawings or writings, and then generate new trajectories that possess similar stylistic qualities. We particularly focus on the generation and stylisation of trajectories that are similar to the ones that can be seen in calligraphy and graffiti art. Our system is able to extract and learn dynamic and visual qualities from a small number of user defined examples which can be recorded with a digitiser device, such as a tablet, mouse or motion capture sensors. Our system is then able to transform new user drawn traces to be kinematically and stylistically similar to the training examples. We implement the system using a Recurrent Mixture Density Network (RMDN) combined with a representation given by the parameters of the Sigma Lognormal model, a physiologically plausible model of movement that has been shown to closely reproduce the velocity and trace of human handwriting gestures.
AIDec 14, 2016
Collaborative creativity with Monte-Carlo Tree Search and Convolutional Neural NetworksMemo Akten, Mick Grierson
We investigate a human-machine collaborative drawing environment in which an autonomous agent sketches images while optionally allowing a user to directly influence the agent's trajectory. We combine Monte Carlo Tree Search with image classifiers and test both shallow models (e.g. multinomial logistic regression) and deep Convolutional Neural Networks (e.g. LeNet, Inception v3). We found that using the shallow model, the agent produces a limited variety of images, which are noticably recogonisable by humans. However, using the deeper models, the agent produces a more diverse range of images, and while the agent remains very confident (99.99%) in having achieved its objective, to humans they mostly resemble unrecognisable 'random' noise. We relate this to recent research which also discovered that 'deep neural networks are easily fooled' \cite{Nguyen2015} and we discuss possible solutions and future directions for the research.
AIDec 14, 2016
Real-time interactive sequence generation and control with Recurrent Neural Network ensemblesMemo Akten, Mick Grierson
Recurrent Neural Networks (RNN), particularly Long Short Term Memory (LSTM) RNNs, are a popular and very successful method for learning and generating sequences. However, current generative RNN techniques do not allow real-time interactive control of the sequence generation process, thus aren't well suited for live creative expression. We propose a method of real-time continuous control and 'steering' of sequence generation using an ensemble of RNNs and dynamically altering the mixture weights of the models. We demonstrate the method using character based LSTM networks and a gestural interface allowing users to 'conduct' the generation of text.