Pop Music Transformer: Beat-based Modeling and Generation of Expressive Pop Piano Compositions
This work addresses the challenge of generating structured pop music for applications in automated composition, though it is incremental as it builds on existing Transformer methods with a data representation tweak.
The paper tackled the problem of generating expressive pop piano music with coherent rhythmic structure by improving the data representation for Transformers, resulting in a model that composes music with better rhythmic structure than existing Transformer models.
A great number of deep learning based models have been recently proposed for automatic music composition. Among these models, the Transformer stands out as a prominent approach for generating expressive classical piano performance with a coherent structure of up to one minute. The model is powerful in that it learns abstractions of data on its own, without much human-imposed domain knowledge or constraints. In contrast with this general approach, this paper shows that Transformers can do even better for music modeling, when we improve the way a musical score is converted into the data fed to a Transformer model. In particular, we seek to impose a metrical structure in the input data, so that Transformers can be more easily aware of the beat-bar-phrase hierarchical structure in music. The new data representation maintains the flexibility of local tempo changes, and provides hurdles to control the rhythmic and harmonic structure of music. With this approach, we build a Pop Music Transformer that composes Pop piano music with better rhythmic structure than existing Transformer models.