LGApr 24, 2023Code
Towards Mode Balancing of Generative Models via Diversity WeightsSebastian Berns, Simon Colton, Christian Guckelsberger
Large data-driven image models are extensively used to support creative and artistic work. Under the currently predominant distribution-fitting paradigm, a dataset is treated as ground truth to be approximated as closely as possible. Yet, many creative applications demand a diverse range of output, and creators often strive to actively diverge from a given data distribution. We argue that an adjustment of modelling objectives, from pure mode coverage towards mode balancing, is necessary to accommodate the goal of higher output diversity. We present diversity weights, a training scheme that increases a model's output diversity by balancing the modes in the training dataset. First experiments in a controlled setting demonstrate the potential of our method. We discuss connections of our approach to diversity, equity, and inclusion in generative machine learning more generally, and computational creativity specifically. An implementation of our algorithm is available at https://github.com/sebastianberns/diversity-weights
AIAug 10, 2023
Exploring XAI for the Arts: Explaining Latent Space in Generative MusicNick Bryan-Kinns, Berker Banar, Corey Ford et al. · bytedance
Explainable AI has the potential to support more interactive and fluid co-creative AI systems which can creatively collaborate with people. To do this, creative AI models need to be amenable to debugging by offering eXplainable AI (XAI) features which are inspectable, understandable, and modifiable. However, currently there is very little XAI for the arts. In this work, we demonstrate how a latent variable model for music generation can be made more explainable; specifically we extend MeasureVAE which generates measures of music. We increase the explainability of the model by: i) using latent space regularisation to force some specific dimensions of the latent space to map to meaningful musical attributes, ii) providing a user interface feedback loop to allow people to adjust dimensions of the latent space and observe the results of these changes in real-time, iii) providing a visualisation of the musical attributes in the latent space to help people understand and predict the effect of changes to latent space dimensions. We suggest that in doing so we bridge the gap between the latent space and the generated musical outcomes in a meaningful way which makes the model and its outputs more explainable and more debuggable.
CYMay 11, 2022
The Beyond the Fence Musical and Computer Says Show DocumentarySimon Colton, Maria Teresa Llano, Rose Hepworth et al.
During 2015 and early 2016, the cultural application of Computational Creativity research and practice took a big leap forward, with a project where multiple computational systems were used to provide advice and material for a new musical theatre production. Billed as the world's first 'computer musical... conceived by computer and substantially crafted by computer', Beyond The Fence was staged in the Arts Theatre in London's West End during February and March of 2016. Various computational approaches to analytical and generative sub-projects were used to bring about the musical, and these efforts were recorded in two 1-hour documentary films made by Wingspan Productions, which were aired on SkyArts under the title Computer Says Show. We provide details here of the project conception and execution, including details of the systems which took on some of the creative responsibility in writing the musical, and the contributions they made. We also provide details of the impact of the project, including a perspective from the two (human) writers with overall control of the creative aspects the musical.
HCMay 11, 2022
Explainable Computational CreativityMaria Teresa Llano, Mark d'Inverno, Matthew Yee-King et al.
Human collaboration with systems within the Computational Creativity (CC) field is often restricted to shallow interactions, where the creative processes, of systems and humans alike, are carried out in isolation, without any (or little) intervention from the user, and without any discussion about how the unfolding decisions are taking place. Fruitful co-creation requires a sustained ongoing interaction that can include discussions of ideas, comparisons to previous/other works, incremental improvements and revisions, etc. For these interactions, communication is an intrinsic factor. This means giving a voice to CC systems and enabling two-way communication channels between them and their users so that they can: explain their processes and decisions, support their ideas so that these are given serious consideration by their creative collaborators, and learn from these discussions to further improve their creative processes. For this, we propose a set of design principles for CC systems that aim at supporting greater co-creation and collaboration with their human collaborators.
AIFeb 1, 2023
Trash to Treasure: Using text-to-image models to inform the design of physical artefactsAmy Smith, Hope Schroeder, Ziv Epstein et al.
Text-to-image generative models have recently exploded in popularity and accessibility. Yet so far, use of these models in creative tasks that bridge the 2D digital world and the creation of physical artefacts has been understudied. We conduct a pilot study to investigate if and how text-to-image models can be used to assist in upstream tasks within the creative process, such as ideation and visualization, prior to a sculpture-making activity. Thirty participants selected sculpture-making materials and generated three images using the Stable Diffusion text-to-image generator, each with text prompts of their choice, with the aim of informing and then creating a physical sculpture. The majority of participants (23/30) reported that the generated images informed their sculptures, and 28/30 reported interest in using text-to-image models to help them in a creative task in the future. We identify several prompt engineering strategies and find that a participant's prompting strategy relates to their stage in the creative process. We discuss how our findings can inform support for users at different stages of the design process and for using text-to-image models for physical artefact design.
SDMay 13Code
Text2Score: Generating Sheet Music From Textual PromptsKeshav Bhandari, Sungkyun Chang, Abhinaba Roy et al.
Developing text-driven symbolic music generation models remains challenging due to the scarcity of aligned text-music datasets and the unreliability of automated captioning pipelines. While most efforts have focused on MIDI, sheet music representations are largely underexplored in text-driven generation. We present Text2Score, a two-stage framework comprising a planning stage and an execution stage for generating sheet music from natural language prompts. By deriving supervision signals directly from symbolic XML data, we propose an alternative training paradigm that bypasses noisy or scarce text-music pairs. In the planning stage, an LLM orchestrator translates a natural language prompt into a structured measure-wise plan defining musical attributes such as instruments, key, time signatures, harmony, etc. This plan is then consumed by a generative model in the execution stage to produce interleaved ABC notation conditioned on the plan's structural constraints. To assess output quality, we introduce an evaluation framework covering playability, readability, instrument utilization, structural complexity, and prompt adherence, validated by expert musicians. Text2Score consistently outperforms both a pure LLM-based agentic framework and three end-to-end baselines across objective and subjective dimensions. We open-source the dataset, code, evaluation set and LLM prompts used in this work; a demo is available on our project page (https://keshavbhandari.github.io/portfolio/text2score).
SDDec 21, 2024Code
Text2midi: Generating Symbolic Music from CaptionsKeshav Bhandari, Abhinaba Roy, Kyra Wang et al.
This paper introduces text2midi, an end-to-end model to generate MIDI files from textual descriptions. Leveraging the growing popularity of multimodal generative approaches, text2midi capitalizes on the extensive availability of textual data and the success of large language models (LLMs). Our end-to-end system harnesses the power of LLMs to generate symbolic music in the form of MIDI files. Specifically, we utilize a pretrained LLM encoder to process captions, which then condition an autoregressive transformer decoder to produce MIDI sequences that accurately reflect the provided descriptions. This intuitive and user-friendly method significantly streamlines the music creation process by allowing users to generate music pieces using text prompts. We conduct comprehensive empirical evaluations, incorporating both automated and human studies, that show our model generates MIDI files of high quality that are indeed controllable by text captions that may include music theory terms such as chords, keys, and tempo. We release the code and music samples on our demo page (https://github.com/AMAAI-Lab/Text2midi) for users to interact with text2midi.
SDApr 21, 2025Code
Aria-MIDI: A Dataset of Piano MIDI Files for Symbolic Music ModelingLouis Bradshaw, Simon Colton
We introduce an extensive new dataset of MIDI files, created by transcribing audio recordings of piano performances into their constituent notes. The data pipeline we use is multi-stage, employing a language model to autonomously crawl and score audio recordings from the internet based on their metadata, followed by a stage of pruning and segmentation using an audio classifier. The resulting dataset contains over one million distinct MIDI files, comprising roughly 100,000 hours of transcribed audio. We provide an in-depth analysis of our techniques, offering statistical insights, and investigate the content by extracting metadata tags, which we also provide. Dataset available at https://github.com/loubbrad/aria-midi.
SDFeb 6, 2025Code
ImprovNet -- Generating Controllable Musical Improvisations with Iterative Corruption RefinementKeshav Bhandari, Sungkyun Chang, Tongyu Lu et al.
Despite deep learning's remarkable advances in style transfer across various domains, generating controllable performance-level musical style transfer for complete symbolically represented musical works remains a challenging area of research. Much of this is owed to limited datasets, especially for genres such as jazz, and the lack of unified models that can handle multiple music generation tasks. This paper presents ImprovNet, a transformer-based architecture that generates expressive and controllable musical improvisations through a self-supervised corruption-refinement training strategy. The improvisational style transfer is aimed at making meaningful modifications to one or more musical elements - melody, harmony or rhythm of the original composition with respect to the target genre. ImprovNet unifies multiple capabilities within a single model: it can perform cross-genre and intra-genre improvisations, harmonize melodies with genre-specific styles, and execute short prompt continuation and infilling tasks. The model's iterative generation framework allows users to control the degree of style transfer and structural similarity to the original composition. Objective and subjective evaluations demonstrate ImprovNet's effectiveness in generating musically coherent improvisations while maintaining structural relationships with the original pieces. The model outperforms Anticipatory Music Transformer in short continuation and infilling tasks and successfully achieves recognizable genre conversion, with 79\% of participants correctly identifying jazz-style improvisations of classical pieces. Our code and demo page can be found at https://github.com/keshavbhandari/improvnet.
SDJan 29, 2025Code
Yin-Yang: Developing Motifs With Long-Term Structure And ControllabilityKeshav Bhandari, Geraint A. Wiggins, Simon Colton
Transformer models have made great strides in generating symbolically represented music with local coherence. However, controlling the development of motifs in a structured way with global form remains an open research area. One of the reasons for this challenge is due to the note-by-note autoregressive generation of such models, which lack the ability to correct themselves after deviations from the motif. In addition, their structural performance on datasets with shorter durations has not been studied in the literature. In this study, we propose Yin-Yang, a framework consisting of a phrase generator, phrase refiner, and phrase selector models for the development of motifs into melodies with long-term structure and controllability. The phrase refiner is trained on a novel corruption-refinement strategy which allows it to produce melodic and rhythmic variations of an original motif at generation time, thereby rectifying deviations of the phrase generator. We also introduce a new objective evaluation metric for quantifying how smoothly the motif manifests itself within the piece. Evaluation results show that our model achieves better performance compared to state-of-the-art transformer models while having the advantage of being controllable and making the generated musical structure semi-interpretable, paving the way for musical analysis. Our code and demo page can be found at https://github.com/keshavbhandari/yinyang.
SDNov 3, 2025
The Ghost in the Keys: A Disklavier Demo for Human-AI Musical Co-CreativityLouis Bradshaw, Alexander Spangher, Stella Biderman et al.
While generative models for music composition are increasingly capable, their adoption by musicians is hindered by text-prompting, an asynchronous workflow disconnected from the embodied, responsive nature of instrumental performance. To address this, we introduce Aria-Duet, an interactive system facilitating a real-time musical duet between a human pianist and Aria, a state-of-the-art generative model, using a Yamaha Disklavier as a shared physical interface. The framework enables a turn-taking collaboration: the user performs, signals a handover, and the model generates a coherent continuation performed acoustically on the piano. Beyond describing the technical architecture enabling this low-latency interaction, we analyze the system's output from a musicological perspective, finding the model can maintain stylistic semantics and develop coherent phrasal ideas, demonstrating that such embodied systems can engage in musically sophisticated dialogue and open a promising new path for human-AI co-creation.
SDMar 12, 2024
Motifs, Phrases, and Beyond: The Modelling of Structure in Symbolic Music GenerationKeshav Bhandari, Simon Colton
Modelling musical structure is vital yet challenging for artificial intelligence systems that generate symbolic music compositions. This literature review dissects the evolution of techniques for incorporating coherent structure, from symbolic approaches to foundational and transformative deep learning methods that harness the power of computation and data across a wide variety of training paradigms. In the later stages, we review an emerging technique which we refer to as "sub-task decomposition" that involves decomposing music generation into separate high-level structural planning and content creation stages. Such systems incorporate some form of musical knowledge or neuro-symbolic methods by extracting melodic skeletons or structural templates to guide the generation. Progress is evident in capturing motifs and repetitions across all three eras reviewed, yet modelling the nuanced development of themes across extended compositions in the style of human composers remains difficult. We outline several key future directions to realize the synergistic benefits of combining approaches from all eras examined.
SDJun 30, 2025
Scaling Self-Supervised Representation Learning for Symbolic Piano PerformanceLouis Bradshaw, Honglu Fan, Alexander Spangher et al.
We study the capabilities of generative autoregressive transformer models trained on large amounts of symbolic solo-piano transcriptions. After first pretraining on approximately 60,000 hours of music, we use a comparatively smaller, high-quality subset, to finetune models to produce musical continuations, perform symbolic classification tasks, and produce general-purpose contrastive MIDI embeddings by adapting the SimCLR framework to symbolic music. When evaluating piano continuation coherence, our generative model outperforms leading symbolic generation techniques and remains competitive with proprietary audio generation models. On MIR classification benchmarks, frozen representations from our contrastive model achieve state-of-the-art results in linear probe experiments, while direct finetuning demonstrates the generalizability of pretrained representations, often requiring only a few hundred labeled examples to specialize to downstream tasks.
LGJul 12, 2021
Active Divergence with Generative Deep Learning -- A Survey and TaxonomyTerence Broad, Sebastian Berns, Simon Colton et al.
Generative deep learning systems offer powerful tools for artefact generation, given their ability to model distributions of data and generate high-fidelity results. In the context of computational creativity, however, a major shortcoming is that they are unable to explicitly diverge from the training data in creative ways and are limited to fitting the target data distribution. To address these limitations, there have been a growing number of approaches for optimising, hacking and rewriting these models in order to actively diverge from the training data. We present a taxonomy and comprehensive survey of the state of the art of active divergence techniques, highlighting the potential for computational creativity researchers to advance these methods and use deep generative models in truly creative systems.
LGJul 5, 2021
Automating Generative Deep Learning for Artistic Purposes: Challenges and OpportunitiesSebastian Berns, Terence Broad, Christian Guckelsberger et al.
We present a framework for automating generative deep learning with a specific focus on artistic applications. The framework provides opportunities to hand over creative responsibilities to a generative system as targets for automation. For the definition of targets, we adopt core concepts from automated machine learning and an analysis of generative deep learning pipelines, both in standard and artistic settings. To motivate the framework, we argue that automation aligns well with the goal of increasing the creative responsibility of a generative system, a central theme in computational creativity research. We understand automation as the challenge of granting a generative system more creative autonomy, by framing the interaction between the user and the system as a co-creative process. The development of the framework is informed by our analysis of the relationship between automation and creative autonomy. An illustrative example shows how the framework can give inspiration and guidance in the process of handing over creative responsibility.
LGMay 10, 2021
Expressivity of Parameterized and Data-driven Representations in Quality Diversity SearchAlexander Hagg, Sebastian Berns, Alexander Asteroth et al.
We consider multi-solution optimization and generative models for the generation of diverse artifacts and the discovery of novel solutions. In cases where the domain's factors of variation are unknown or too complex to encode manually, generative models can provide a learned latent space to approximate these factors. When used as a search space, however, the range and diversity of possible outputs are limited to the expressivity and generative capabilities of the learned model. We compare the output diversity of a quality diversity evolutionary search performed in two different search spaces: 1) a predefined parameterized space and 2) the latent space of a variational autoencoder model. We find that the search on an explicit parametric encoding creates more diverse artifact sets than searching the latent space. A learned model is better at interpolating between known data points than at extrapolating or expanding towards unseen examples. We recommend using a generative model's latent space primarily to measure similarity between artifacts rather than for search and generation. Whenever a parametric encoding is obtainable, it should be preferred over a learned representation as it produces a higher diversity of solutions.
MMMar 6, 2019
Camera Obscurer: Generative Art for Design InspirationDilpreet Singh, Nina Rajcic, Simon Colton et al.
We investigate using generated decorative art as a source of inspiration for design tasks. Using a visual similarity search for image retrieval, the \emph{Camera Obscurer} app enables rapid searching of tens of thousands of generated abstract images of various types. The seed for a visual similarity search is a given image, and the retrieved generated images share some visual similarity with the seed. Implemented in a hand-held device, the app empowers users to use photos of their surroundings to search through the archive of generated images and other image archives. Being abstract in nature, the retrieved images supplement the seed image rather than replace it, providing different visual stimuli including shapes, colours, textures and juxtapositions, in addition to affording their own interpretations. This approach can therefore be used to provide inspiration for a design task, with the abstract images suggesting new ideas that might give direction to a graphic design project. We describe a crowdsourcing experiment with the app to estimate user confidence in retrieved images, and we describe a pilot study where Camera Obscurer provided inspiration for a design task. These experiments have enabled us to describe future improvements, and to begin to understand sources of visual inspiration for design tasks.
AIMar 4, 2018
Exploring Novel Game Spaces with Fluidic GamesSwen E. Gaudl, Mark J. Nelson, Simon Colton et al.
With the growing integration of smartphones into our daily lives, and their increased ease of use, mobile games have become highly popular across all demographics. People listen to music, play games or read the news while in transit or bridging gap times. While mobile gaming is gaining popularity, mobile expression of creativity is still in its early stages. We present here a new type of mobile app -- fluidic games -- and illustrate our iterative approach to their design. This new type of app seamlessly integrates exploration of the design space into the actual user experience of playing the game, and aims to enrich the user experience. To better illustrate the game domain and our approach, we discuss one specific fluidic game, which is available as a commercial product. We also briefly discuss open challenges such as player support and how generative techniques can aid the exploration of the game space further.
AIJun 9, 2017
Off The Beaten Lane: AI Challenges In MOBAs Beyond Player ControlMichael Cook, Adam Summerville, Simon Colton
MOBAs represent a huge segment of online gaming and are growing as both an eSport and a casual genre. The natural starting point for AI researchers interested in MOBAs is to develop an AI to play the game better than a human - but MOBAs have many more challenges besides adversarial AI. In this paper we introduce the reader to the wider context of MOBA culture, propose a range of challenges faced by the community today, and posit concrete AI projects that can be undertaken to begin solving them.
SEMar 2, 2016
Semi-Automated Design Space Exploration for Formal ModellingGudmund Grov, Andrew Ireland, Maria Teresa Llano et al.
Refinement based formal methods allow the modelling of systems through incremental steps via abstraction. Discovering the right levels of abstraction, formulating correct and meaningful invariants, and analysing faulty models are some of the challenges faced when using this technique. Here, we propose Design Space Exploration, an approach that aims to assist a designer by automatically providing high-level modelling guidance in real-time. More specifically, through the combination of common patterns of modelling with techniques from automated theory formation and automated reasoning, different design alternatives are explored and suitable models that deal with faults are proposed.
AIJul 25, 2015
The Digital Synaptic Neural Substrate: A New Approach to Computational CreativityAzlan Iqbal, Matej Guid, Simon Colton et al.
We introduce a new artificial intelligence (AI) approach called, the 'Digital Synaptic Neural Substrate' (DSNS). It uses selected attributes from objects in various domains (e.g. chess problems, classical music, renowned artworks) and recombines them in such a way as to generate new attributes that can then, in principle, be used to create novel objects of creative value to humans relating to any one of the source domains. This allows some of the burden of creative content generation to be passed from humans to machines. The approach was tested in the domain of chess problem composition. We used it to automatically compose numerous sets of chess problems based on attributes extracted and recombined from chess problems and tournament games by humans, renowned paintings, computer-evolved abstract art, photographs of people, and classical music tracks. The quality of these generated chess problems was then assessed automatically using an existing and experimentally-validated computational chess aesthetics model. They were also assessed by human experts in the domain. The results suggest that attributes collected and recombined from chess and other domains using the DSNS approach can indeed be used to automatically generate chess problems of reasonably high aesthetic quality. In particular, a low quality chess source (i.e. tournament game sequences between weak players) used in combination with actual photographs of people was able to produce three-move chess problems of comparable quality or better to those generated using a high quality chess source (i.e. published compositions by human experts), and more efficiently as well. Why information from a foreign domain can be integrated and functional in this way remains an open question for now. The DSNS approach is, in principle, scalable and applicable to any domain in which objects have attributes that can be represented using real numbers.
AINov 3, 2014
Modelling serendipity in a computational contextJoseph Corneli, Anna Jordanous, Christian Guckelsberger et al.
The term serendipity describes a creative process that develops, in context, with the active participation of a creative agent, but not entirely within that agent's control. While a system cannot be made to perform serendipitously on demand, we argue that its $\mathit{serendipity\ potential}$ can be increased by means of a suitable system architecture and other design choices. We distil a unified description of serendipitous occurrences from historical theorisations of serendipity and creativity. This takes the form of a framework with six phases: $\mathit{perception}$, $\mathit{attention}$, $\mathit{interest}$, $\mathit{explanation}$, $\mathit{bridge}$, and $\mathit{valuation}$. We then use this framework to organise a survey of literature in cognitive science, philosophy, and computing, which yields practical definitions of the six phases, along with heuristics for implementation. We use the resulting model to evaluate the serendipity potential of four existing systems developed by others, and two systems previously developed by two of the authors. Most existing research that considers serendipity in a computing context deals with serendipity as a service; here we relate theories of serendipity to the development of autonomous systems and computational creativity practice. We argue that serendipity is not teleologically blind, and outline representative directions for future applications of our model. We conclude that it is feasible to equip computational systems with the potential for serendipity, and that this could be beneficial in varied computational creativity/AI applications, particularly those designed to operate responsively in real-world contexts.