SDLGASDec 31, 2021

Evaluating Deep Music Generation Methods Using Data Augmentation

arXiv:2201.00052v1
Originality Incremental advance
AI Analysis

This work addresses the need for objective evaluation in music generation research, offering a novel framework for researchers and practitioners, though it is incremental in applying data augmentation to this domain.

The paper tackled the problem of evaluating deep music generation methods by proposing an objective framework that measures the meaningful information in generated samples, such as emotion or mood/theme, through data augmentation of a classifier. They found that augmenting training data with generated samples from models like SampleRNN, Jukebox, and DDSP improved classification performance, with specific gains in mood/theme prediction.

Despite advances in deep algorithmic music generation, evaluation of generated samples often relies on human evaluation, which is subjective and costly. We focus on designing a homogeneous, objective framework for evaluating samples of algorithmically generated music. Any engineered measures to evaluate generated music typically attempt to define the samples' musicality, but do not capture qualities of music such as theme or mood. We do not seek to assess the musical merit of generated music, but instead explore whether generated samples contain meaningful information pertaining to emotion or mood/theme. We achieve this by measuring the change in predictive performance of a music mood/theme classifier after augmenting its training data with generated samples. We analyse music samples generated by three models -- SampleRNN, Jukebox, and DDSP -- and employ a homogeneous framework across all methods to allow for objective comparison. This is the first attempt at augmenting a music genre classification dataset with conditionally generated music. We investigate the classification performance improvement using deep music generation and the ability of the generators to make emotional music by using an additional, emotion annotation of the dataset. Finally, we use a classifier trained on real data to evaluate the label validity of class-conditionally generated samples.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes