SDLGASOct 5, 2023

Deep Generative Models of Music Expectation

arXiv:2310.03500v13 citationsh-index: 1
Originality Incremental advance
AI Analysis

This work addresses the challenge of accurately representing musical surprisal for affective response theories, offering an incremental improvement over prior probabilistic models by leveraging deep generative techniques.

The authors tackled the problem of modeling musical expectation and surprisal by using a deep diffusion model to compute approximate likelihoods of musical sequences, showing it yields surprisal values with a negative quadratic relationship to human liking ratings, competitive with state-of-the-art methods like IDyOM.

A prominent theory of affective response to music revolves around the concepts of surprisal and expectation. In prior work, this idea has been operationalized in the form of probabilistic models of music which allow for precise computation of song (or note-by-note) probabilities, conditioned on a 'training set' of prior musical or cultural experiences. To date, however, these models have been limited to compute exact probabilities through hand-crafted features or restricted to linear models which are likely not sufficient to represent the complex conditional distributions present in music. In this work, we propose to use modern deep probabilistic generative models in the form of a Diffusion Model to compute an approximate likelihood of a musical input sequence. Unlike prior work, such a generative model parameterized by deep neural networks is able to learn complex non-linear features directly from a training set itself. In doing so, we expect to find that such models are able to more accurately represent the 'surprisal' of music for human listeners. From the literature, it is known that there is an inverted U-shaped relationship between surprisal and the amount human subjects 'like' a given song. In this work we show that pre-trained diffusion models indeed yield musical surprisal values which exhibit a negative quadratic relationship with measured subject 'liking' ratings, and that the quality of this relationship is competitive with state of the art methods such as IDyOM. We therefore present this model a preliminary step in developing modern deep generative models of music expectation and subjective likability.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes