Generating Music with a Self-Correcting Non-Chronological Autoregressive Model
This addresses the issue of error accumulation in music generation for AI and human collaborative composition, though it is incremental as it builds on autoregressive models.
The paper tackles the problem of generating music by introducing a self-correcting, non-chronological autoregressive model that represents music as edit events to fix mistakes and control composition, resulting in better performance than orderless NADE and Gibbs sampling approaches as shown by quantitative metrics and human surveys.
We describe a novel approach for generating music using a self-correcting, non-chronological, autoregressive model. We represent music as a sequence of edit events, each of which denotes either the addition or removal of a note---even a note previously generated by the model. During inference, we generate one edit event at a time using direct ancestral sampling. Our approach allows the model to fix previous mistakes such as incorrectly sampled notes and prevent accumulation of errors which autoregressive models are prone to have. Another benefit is a finer, note-by-note control during human and AI collaborative composition. We show through quantitative metrics and human survey evaluation that our approach generates better results than orderless NADE and Gibbs sampling approaches.