A Hierarchical Recurrent Neural Network for Symbolic Melody Generation
This addresses the challenge of long-term structure in melody generation for music AI applications, but it is incremental as it builds on existing neural network approaches.
The authors tackled the problem of generating symbolic melodies with long-term structure by proposing a hierarchical recurrent neural network with three LSTM subnetworks working in a coarse-to-fine manner, and it produced better melodies than state-of-the-art models like MidiNet and MusicVAE in human evaluations.
In recent years, neural networks have been used to generate symbolic melodies. However, the long-term structure in the melody has posed great difficulty for designing a good model. In this paper, we present a hierarchical recurrent neural network for melody generation, which consists of three Long-Short-Term-Memory (LSTM) subnetworks working in a coarse-to-fine manner along time. Specifically, the three subnetworks generate bar profiles, beat profiles and notes in turn, and the output of the high-level subnetworks are fed into the low-level subnetworks, serving as guidance for generating the finer time-scale melody components in low-level subnetworks. Two human behavior experiments demonstrate the advantage of this structure over the single-layer LSTM which attempts to learn all hidden structures in melodies. Compared with the state-of-the-art models MidiNet and MusicVAE, the hierarchical recurrent neural network produces better melodies evaluated by humans.