MLJun 5, 2023Code
Latent Optimal Paths by Gumbel Propagation for Variational Bayesian Dynamic ProgrammingXinlei Niu, Christian Walder, Jing Zhang et al.
We propose the stochastic optimal path which solves the classical optimal path problem by a probability-softening solution. This unified approach transforms a wide range of DP problems into directed acyclic graphs in which all paths follow a Gibbs distribution. We show the equivalence of the Gibbs distribution to a message-passing algorithm by the properties of the Gumbel distribution and give all the ingredients required for variational Bayesian inference of a latent path, namely Bayesian dynamic programming (BDP). We demonstrate the usage of BDP in the latent space of variational autoencoders (VAEs) and propose the BDP-VAE which captures structured sparse optimal paths as latent variables. This enables end-to-end training for generative tasks in which models rely on unobserved structural information. At last, we validate the behavior of our approach and showcase its applicability in two real-world applications: text-to-speech and singing voice synthesis. Our implementation code is available at \url{https://github.com/XinleiNIU/LatentOptimalPathsBayesianDP}.
29.3SDApr 28
Huí Sù: Co-constructing a Dual Feedback ApparatusYichen Wang, Charles Patrick Martin
This performance presents a duet between two intelligent musical instruments, Sù (to trace back; to go upstream) and Agentier (playing on agentic clavier), and their human performers, connected through feedback loops. Rather than treating AI as a tool that responds predictably to input, both systems operate recursively, where past actions continuously influence future behaviour. The Sù operates in the audio space through latent representation. Its performer uses Make Noise 0-series synthesisers and MIDI controllers to work with a neural feedback synthesis system based on a RAVE model, with a latent feedback loop embedded within the model's internal structure. This allows the instrument to remember and reuse its own internal states, influencing ongoing sound generation through its recent sonic history. The Agentier functions in the control space. Its performer interacts with the system using a Roland S-1 synthesiser and Keith McMillen QuNeo touchpad, where control gestures are routed into a recurrent neural network that feeds back into the synthesis process. Through this feedback loop, the system actively shapes the evolution of control signals over time. Contrasting feedback in the audio and control domains, the performance explores shared agency, resistance, and negotiation between humans and intelligent musical systems. Musical phenomena are co-produced through the entangled states of interaction, rather than through pre-existing system configuration or fixed mappings.
6.8SDApr 26
Opening the Design Space: Two Years of Performance with Intelligent Musical InstrumentsCharles Patrick Martin
Machine generation of symbolic music and digital audio are hot topics but there have been relatively few digital musical instruments that integrate generative AI. Present musical AI tools are not artist centred and do not support experimentation or integrating into musical instruments or practices. This work introduces an inexpensive generative AI instrument platform based on a single board computer that connects via MIDI to other musical devices. The platform uses artist-collected datasets with models trained on a regular computer. This paper asks what the design space of intelligent musical instruments might look like when accessible and portable AI systems are available for artistic exploration. I contribute five examples of instruments created and tested through a two-year first-person artistic research process. These show that (re)mapping can replace retraining for discovering AI interaction, that fast input interleaving is a new co-creative strategy, that small-data AI models can be a transportable design resource, and that cheap hardware can lower barriers to inclusion. This work could enable artists to explore new interaction and performance schemes with intelligent musical instruments.
HCDec 4, 2020
Composing an Ensemble Standstill Work for Myo and BelaCharles Patrick Martin, Alexander Refsum Jensenius, Jim Torresen
This paper describes the process of developing a standstill performance work using the Myo gesture control armband and the Bela embedded computing platform. The combination of Myo and Bela allows a portable and extensible version of the standstill performance concept while introducing muscle tension as an additional control parameter. We describe the technical details of our setup and introduce Myo-to-Bela and Myo-to-OSC software bridges that assist with prototyping compositions using the Myo controller.
HCDec 3, 2020
A Laptop Ensemble Performance System using Recurrent Neural NetworksRohan Proctor, Charles Patrick Martin
The popularity of applying machine learning techniques in musical domains has created an inherent availability of freely accessible pre-trained neural network (NN) models ready for use in creative applications. This work outlines the implementation of one such application in the form of an assistance tool designed for live improvisational performances by laptop ensembles. The primary intention was to leverage off-the-shelf pre-trained NN models as a basis for assisting individual performers either as musical novices looking to engage with more experienced performers or as a tool to expand musical possibilities through new forms of creative expression. The system expands upon a variety of ideas found in different research areas including new interfaces for musical expression, generative music and group performance to produce a networked performance solution served via a web-browser interface. The final implementation of the system offers performers a mixture of high and low-level controls to influence the shape of sequences of notes output by locally run NN models in real time, also allowing performers to define their level of engagement with the assisting generative models. Two test performances were played, with the system shown to feasibly support four performers over a four minute piece while producing musically cohesive and engaging music. Iterations on the design of the system exposed technical constraints on the use of a JavaScript environment for generative models in a live music context, largely derived from inescapable processing overheads.
HCDec 3, 2020
Sonic Sculpture: Activating Engagement with Head-Mounted Augmented RealityCharles Patrick Martin, Zeruo Liu, Yichen Wang et al.
This work examines how head-mounted AR can be used to build an interactive sonic landscape to engage with a public sculpture. We describe a sonic artwork, "Listening To Listening", that has been designed to accompany a real-world sculpture with two prototype interaction schemes. Our artwork is created for the HoloLens platform so that users can have an individual experience in a mixed reality context. Personal head-mounted AR systems have recently become available and practical for integration into public art projects, however research into sonic sculpture works has yet to account for the affordances of current portable and mainstream AR systems. In this work, we take advantage of the HoloLens' spatial awareness to build sonic spaces that have a precise spatial relationship to a given sculpture and where the sculpture itself is modelled in the augmented scene as an "invisible hologram". We describe the artistic rationale for our artwork, the design of the two interaction schemes, and the technical and usability feedback that we have obtained from demonstrations during iterative development.
ROFeb 12, 2019
Evolving Robots on Easy Mode: Towards a Variable Complexity Controller for QuadrupedsTønnes Frostad Nygaard, Charles Patrick Martin, Jim Torresen et al.
The complexity of a legged robot's environment or task can inform how specialised its gait must be to ensure success. Evolving specialised robotic gaits demands many evaluations - acceptable for computer simulations, but not for physical robots. For some tasks, a more general gait, with lower optimization costs, could be satisfactory. In this paper, we introduce a new type of gait controller where complexity can be set by a single parameter, using a dynamic genotype-phenotype mapping. Low controller complexity leads to conservative gaits, while higher complexity allows more sophistication and high performance for demanding tasks, at the cost of optimization effort. We investigate the new controller on a virtual robot in simulations and do preliminary testing on a real-world robot. We show that having variable complexity allows us to adapt to different optimization budgets. With a high evaluation budget in simulation, a complex controller performs best. Moreover, real-world evolution with a limited evaluation budget indicates that a lower gait complexity is preferable for a relatively simple environment.
LGJan 23, 2019
How do Mixture Density RNNs Predict the Future?Kai Olav Ellefsen, Charles Patrick Martin, Jim Torresen
Gaining a better understanding of how and what machine learning systems learn is important to increase confidence in their decisions and catalyze further research. In this paper, we analyze the predictions made by a specific type of recurrent neural network, mixture density RNNs (MD-RNNs). These networks learn to model predictions as a combination of multiple Gaussian distributions, making them particularly interesting for problems where a sequence of inputs may lead to several distinct future possibilities. An example is learning internal models of an environment, where different events may or may not occur, but where the average over different events is not meaningful. By analyzing the predictions made by trained MD-RNNs, we find that their different Gaussian components have two complementary roles: 1) Separately modeling different stochastic events and 2) Separately modeling scenarios governed by different rules. These findings increase our understanding of what is learned by predictive MD-RNNs, and open up new research directions for further understanding how we can benefit from their self-organizing model decomposition.