Maximum entropy models for generation of expressive music
This work addresses the problem of automating expressive music generation for applications in music technology, though it is incremental as it applies an existing statistical method to a specific domain.
The paper tackled generating expressive monophonic music by modeling the difference between a musical score and a human performance using Maximum Entropy models, trained on 150 melodies played by a professional pianist, with results showing good predictive power and listener preferences significantly favoring MaxEnt-generated melodies over non-expressive or random ones, sometimes nearly matching human performances.
In the context of contemporary monophonic music, expression can be seen as the difference between a musical performance and its symbolic representation, i.e. a musical score. In this paper, we show how Maximum Entropy (MaxEnt) models can be used to generate musical expression in order to mimic a human performance. As a training corpus, we had a professional pianist play about 150 melodies of jazz, pop, and latin jazz. The results show a good predictive power, validating the choice of our model. Additionally, we set up a listening test whose results reveal that on average, people significantly prefer the melodies generated by the MaxEnt model than the ones without any expression, or with fully random expression. Furthermore, in some cases, MaxEnt melodies are almost as popular as the human performed ones.