SDLGMMASNov 7, 2023

Exploring Latent Spaces of Tonal Music using Variational Autoencoders

arXiv:2311.03621v11 citationsh-index: 3Has Code
Originality Synthesis-oriented
AI Analysis

This work addresses music cognition and computational musicology by providing a method to model tonal hierarchies, though it is incremental as it applies existing VAE methods to new music data and encodings.

The paper tackled the problem of representing tonal music in latent spaces using Variational Autoencoders (VAEs) to align with cognitive pitch spaces, finding that Pitch DFT encoding performed best in capturing key relations and tonal hierarchies with concrete metrics like MSE and KL-divergence.

Variational Autoencoders (VAEs) have proven to be effective models for producing latent representations of cognitive and semantic value. We assess the degree to which VAEs trained on a prototypical tonal music corpus of 371 Bach's chorales define latent spaces representative of the circle of fifths and the hierarchical relation of each key component pitch as drawn in music cognition. In detail, we compare the latent space of different VAE corpus encodings -- Piano roll, MIDI, ABC, Tonnetz, DFT of pitch, and pitch class distributions -- in providing a pitch space for key relations that align with cognitive distances. We evaluate the model performance of these encodings using objective metrics to capture accuracy, mean square error (MSE), KL-divergence, and computational cost. The ABC encoding performs the best in reconstructing the original data, while the Pitch DFT seems to capture more information from the latent space. Furthermore, an objective evaluation of 12 major or minor transpositions per piece is adopted to quantify the alignment of 1) intra- and inter-segment distances per key and 2) the key distances to cognitive pitch spaces. Our results show that Pitch DFT VAE latent spaces align best with cognitive spaces and provide a common-tone space where overlapping objects within a key are fuzzy clusters, which impose a well-defined order of structural significance or stability -- i.e., a tonal hierarchy. Tonal hierarchies of different keys can be used to measure key distances and the relationships of their in-key components at multiple hierarchies (e.g., notes and chords). The implementation of our VAE and the encodings framework are made available online.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes