SD LG ASJul 31, 2023

Exploring how a Generative AI interprets music

Gabriela Barenboim, Luigi Del Debbio, Johannes Hirn, Veronica Sanz

arXiv:2308.00015v18.49 citationsh-index: 53

Originality Synthesis-oriented

AI Analysis

This provides incremental insights into how generative AI models represent musical features, which could aid in improving music generation and analysis tools.

The study investigated how Google's MusicVAE interprets music by analyzing its latent space, finding that only a few dozen 'music neurons' encode pitch and rhythm information non-linearly, while melody emerges in longer sequences.

We use Google's MusicVAE, a Variational Auto-Encoder with a 512-dimensional latent space to represent a few bars of music, and organize the latent dimensions according to their relevance in describing music. We find that, on average, most latent neurons remain silent when fed real music tracks: we call these "noise" neurons. The remaining few dozens of latent neurons that do fire are called "music neurons". We ask which neurons carry the musical information and what kind of musical information they encode, namely something that can be identified as pitch, rhythm or melody. We find that most of the information about pitch and rhythm is encoded in the first few music neurons: the neural network has thus constructed a couple of variables that non-linearly encode many human-defined variables used to describe pitch and rhythm. The concept of melody only seems to show up in independent neurons for longer sequences of music.

View on arXiv PDF

Similar