This Time with Feeling: Learning Expressive Musical Performance
This addresses the challenge of creating realistic and expressive music performances for composers and musicians, but it is incremental as it builds on existing LSTM methods.
The paper tackles the problem of generating expressive musical performances by jointly predicting notes, timing, and dynamics, and shows that an LSTM-based model subjectively performs well on this task with feedback from professionals.
Music generation has generally been focused on either creating scores or interpreting them. We discuss differences between these two problems and propose that, in fact, it may be valuable to work in the space of direct $\it performance$ generation: jointly predicting the notes $\it and$ $\it also$ their expressive timing and dynamics. We consider the significance and qualities of the data set needed for this. Having identified both a problem domain and characteristics of an appropriate data set, we show an LSTM-based recurrent network model that subjectively performs quite well on this task. Critically, we provide generated examples. We also include feedback from professional composers and musicians about some of these examples.