53.8SDJun 2
Tonal parsimony in chord-sequence analysis: combining modulation cost and tonal vocabularyFrançois Pachet
We study the assignment of local tonalities to chord sequences, a task useful for harmonic analysis, composition, and jazz-oriented improvisation. Standard dynamic-programming approaches minimize modulations but can introduce unnecessarily many tonal centers. We compare this transition-only objective with pure minimum-vocabulary analysis and with tonal parsimony, which minimizes lexicographically the number of modulations and then the number of distinct tonalities. Although this joint objective is combinatorially hard in general, we give exact algorithms exploiting the fixed 24-tonality major/minor universe. On 31,032 LMD Chords sequences, tonal parsimony preserves the transition optimum while reducing tonal vocabulary in 55.8% of cases. With weighted jazz-substitution closure, it lowers mean tonalities from 3.802 to 3.206 and modulations from 16.728 to 12.141. On 1,555 annotated jazz standards, it improves compatible chord-scale agreement to 95.6%, supporting tractable professional-scale harmonic analysis.
17.1AIMay 8
Exact Regular-Constrained Variable-Order Markov Generation via Sparse Context-State Belief PropagationFrançois Pachet
Variable-order Markov models generate sequences over a finite alphabet by conditioning each symbol on the longest available suffix of the generated history. Regular constraints, by contrast, describe finite-horizon control requirements by an automaton: fixed positions, forced endings, metrical patterns, and forbidden copied fragments are all special cases. Existing exact methods already handle regular constraints with belief propagation for first-order Markov chains. The contribution here is the variable-order extension: identifying the state space on which the existing BP-regular machinery must be run when the generator is a variable-order/backoff model. A first-order constraint layer can enforce useful support conditions, but it computes future mass after merging histories that a variable-order generator deliberately keeps distinct. We formalize this mismatch and give the sparse construction obtained by replacing the first-order Markov state with the observed context state, then taking the standard product with the regular constraint automaton. For a fixed trained context graph and automaton, inference is linear in the sequence horizon; in general it is polynomial in the number of reachable product edges. This gives the correct variable-order distribution conditioned on regular constraints without expanding to all K-tuples. The same finite-source interface supports reversible data augmentation by inverse count lookup, matching materialized transposition augmentation without storing transformed corpora. We also separate exact BP inference from generation-time backoff policies, such as singleton avoidance, whose stochastic semantics must be made explicit if exactness is claimed.
SDJun 16, 2020
Assisted music creation with Flow Machines: towards new categories of newFrançois Pachet, Pierre Roy, Benoit Carré
This chapter reflects on about 10 years of research in AI- assisted music composition, in particular during the Flow Machines project. We reflect on the motivations for such a project, its background, its main results and impact, both technological and musical, several years after its completion. We conclude with a proposal for new categories of new, created by the many uses of AI techniques to generate novel material.
IRMar 12, 2019
The Skipping Behavior of Users of Music Streaming Services and its Relation to Musical StructureNicola Montecchio, Pierre Roy, François Pachet
The behavior of users of music streaming services is investigated from the point of view of the temporal dimension of individual songs; specifically, the main object of the analysis is the point in time within a song at which users stop listening and start streaming another song ("skip"). The main contribution of this study is the ascertainment of a correlation between the distribution in time of skipping events and the musical structure of songs. It is also shown that such distribution is not only specific to the individual songs, but also independent of the cohort of users and, under stationary conditions, date of observation. Finally, user behavioral data is used to train a predictor of the musical structure of a song solely from its acoustic content; it is shown that the use of such data, available in large quantities to music streaming services, yields significant improvements in accuracy over the customary fashion of training this class of algorithms, in which only smaller amounts of hand-labeled data are available.
SDDec 9, 2017
Music Generation by Deep Learning - Challenges and DirectionsJean-Pierre Briot, François Pachet
In addition to traditional tasks such as prediction, classification and translation, deep learning is receiving growing attention as an approach for music generation, as witnessed by recent research groups such as Magenta at Google and CTRL (Creator Technology Research Lab) at Spotify. The motivation is in using the capacity of deep learning architectures and training techniques to automatically learn musical styles from arbitrary musical corpora and then to generate samples from the estimated distribution. However, a direct application of deep learning to generate content rapidly reaches limits as the generated content tends to mimic the training set without exhibiting true creativity. Moreover, deep learning architectures do not offer direct ways for controlling generation (e.g., imposing some tonality or other arbitrary constraints). Furthermore, deep learning architectures alone are autistic automata which generate music autonomously without human user interaction, far from the objective of interactively assisting musicians to compose and refine music. Issues such as: control, structure, creativity and interactivity are the focus of our analysis. In this paper, we select some limitations of a direct application of deep learning to music generation, analyze why the issues are not fulfilled and how to address them by possible approaches. Various examples of recent systems are cited as examples of promising directions.
LGJul 14, 2017
GLSR-VAE: Geodesic Latent Space Regularization for Variational AutoEncoder ArchitecturesGaëtan Hadjeres, Frank Nielsen, François Pachet
VAEs (Variational AutoEncoders) have proved to be powerful in the context of density modeling and have been used in a variety of contexts for creative purposes. In many settings, the data we model possesses continuous attributes that we would like to take into account at generation time. We propose in this paper GLSR-VAE, a Geodesic Latent Space Regularization for the Variational AutoEncoder architecture and its generalizations which allows a fine control on the embedding of the data into the latent space. When augmenting the VAE loss with this regularization, changes in the learned latent space reflects changes of the attributes of the data. This deeper understanding of the VAE latent space structure offers the possibility to modulate the attributes of the generated data in a continuous way. We demonstrate its efficiency on a monophonic music generation task where we manage to generate variations of discrete sequences in an intended and playful way.
AIMar 2, 2017
Sampling Variations of Lead SheetsPierre Roy, Alexandre Papadopoulos, François Pachet
Machine-learning techniques have been recently used with spectacular results to generate artefacts such as music or text. However, these techniques are still unable to capture and generate artefacts that are convincingly structured. In this paper we present an approach to generate structured musical sequences. We introduce a mechanism for sampling efficiently variations of musical sequences. Given a input sequence and a statistical model, this mechanism samples a set of sequences whose distance to the input sequence is approximately within specified bounds. This mechanism is implemented as an extension of belief propagation, and uses local fields to bias the generation. We show experimentally that sampled sequences are indeed closely correlated to the standard musical similarity measure defined by Mongeau and Sankoff. We then show how this mechanism can used to implement composition strategies that enforce arbitrary structure on a musical lead sheet generation problem.
AIDec 3, 2016
DeepBach: a Steerable Model for Bach Chorales GenerationGaëtan Hadjeres, François Pachet, Frank Nielsen
This paper introduces DeepBach, a graphical model aimed at modeling polyphonic music and specifically hymn-like pieces. We claim that, after being trained on the chorale harmonizations by Johann Sebastian Bach, our model is capable of generating highly convincing chorales in the style of Bach. DeepBach's strength comes from the use of pseudo-Gibbs sampling coupled with an adapted representation of musical data. This is in contrast with many automatic music composition approaches which tend to compose music sequentially. Our model is also steerable in the sense that a user can constrain the generation by imposing positional constraints such as notes, rhythms or cadences in the generated score. We also provide a plugin on top of the MuseScore music editor making the interaction with DeepBach easy to use.
SDNov 27, 2016
SISO and SIMO Accompaniment Cancellation for Live Solo Recordings Based on Short-Time ERB-Band Wiener Filtering and Spectral SubtractionStanislaw Gorlow, Mathieu Ramona, François Pachet
Research in collaborative music learning is subject to unresolved problems demanding new technological solutions. One such problem poses the suppression of the accompaniment in a live recording of a performance during practice, which can be for the purposes of self-assessment or further machine-aided analysis. Being able to separate a solo from the accompaniment allows to create learning agents that may act as personal tutors and help the apprentice improve his or her technique. First, we start from the classical adaptive noise cancelling approach, and adjust it to the problem at hand. In a second step, we compare some adaptive and Wiener filtering approaches and assess their performances on the task. Our findings underpin that adaptive filtering is inapt of dealing with music signals and that Wiener filtering in the short-time Fourier transform domain is a much more effective approach. In addition, it is very cheap if carried out in the frequency bands of auditory filters. A double-output extension based on maximal-ratio combining is also proposed.
SDNov 20, 2016
Decision-Based Transcription of Jazz Guitar Solos Using a Harmonic Bident Analysis Filter Bank and Spectral Distribution WeightingStanislaw Gorlow, Mathieu Ramona, François Pachet
Jazz guitar solos are improvised melody lines played on one instrument on top of a chordal accompaniment (comping). As the improvisation happens spontaneously, a reference score is non-existent, only a lead sheet. There are situations, however, when one would like to have the original melody lines in the form of notated music, see the Real Book. The motivation is either for the purpose of practice and imitation or for musical analysis. In this work, an automatic transcriber for jazz guitar solos is developed. It resorts to a very intuitive representation of tonal music signals: the pitchgram. No instrument-specific modeling is involved, so the transcriber should be applicable to other pitched instruments as well. Neither is there the need to learn any note profiles prior to or during the transcription. Essentially, the proposed transcriber is a decision tree, thus a classifier, with a depth of 3. It has a (very) low computational complexity and can be run on-line. The decision rules can be refined or extended with no or little musical education. The transcriber's performance is evaluated on a set of ten jazz solo excerpts and compared with a state-of-the-art transcription system for the guitar plus PYIN. We achieve an improvement of 34% w.r.t. the reference system and 19% w.r.t. PYIN in terms of the F-measure. Another measure of accuracy, the error score, attests that the number of erroneous pitch detections is reduced by more than 50% w.r.t. the reference system and by 45% w.r.t. PYIN.
AIOct 12, 2016
Maximum entropy models for generation of expressive musicSimon Moulieras, François Pachet
In the context of contemporary monophonic music, expression can be seen as the difference between a musical performance and its symbolic representation, i.e. a musical score. In this paper, we show how Maximum Entropy (MaxEnt) models can be used to generate musical expression in order to mimic a human performance. As a training corpus, we had a professional pianist play about 150 melodies of jazz, pop, and latin jazz. The results show a good predictive power, validating the choice of our model. Additionally, we set up a listening test whose results reveal that on average, people significantly prefer the melodies generated by the MaxEnt model than the ones without any expression, or with fully random expression. Furthermore, in some cases, MaxEnt melodies are almost as popular as the human performed ones.
MLOct 11, 2016
Maximum entropy models capture melodic stylesJason Sakellariou, Francesca Tria, Vittorio Loreto et al.
We introduce a Maximum Entropy model able to capture the statistics of melodies in music. The model can be used to generate new melodies that emulate the style of the musical corpus which was used to train it. Instead of using the $n-$body interactions of $(n-1)-$order Markov models, traditionally used in automatic music generation, we use a $k-$nearest neighbour model with pairwise interactions only. In that way, we keep the number of parameters low and avoid over-fitting problems typical of Markov models. We show that long-range musical phrases don't need to be explicitly enforced using high-order Markov interactions, but can instead emerge from multiple, competing, pairwise interactions. We validate our Maximum Entropy model by contrasting how much the generated sequences capture the style of the original corpus without plagiarizing it. To this end we use a data-compression approach to discriminate the levels of borrowing and innovation featured by the artificial sequences. The results show that our modelling scheme outperforms both fixed-order and variable-order Markov models. This shows that, despite being based only on pairwise interactions, this Maximum Entropy scheme opens the possibility to generate musically sensible alterations of the original phrases, providing a way to generate innovation.
AISep 16, 2016
Style Imitation and Chord Invention in Polyphonic Music with Exponential FamiliesGaëtan Hadjeres, Jason Sakellariou, François Pachet
Modeling polyphonic music is a particularly challenging task because of the intricate interplay between melody and harmony. A good model should satisfy three requirements: statistical accuracy (capturing faithfully the statistics of correlations at various ranges, horizontally and vertically), flexibility (coping with arbitrary user constraints), and generalization capacity (inventing new material, while staying in the style of the training corpus). Models proposed so far fail on at least one of these requirements. We propose a statistical model of polyphonic music, based on the maximum entropy principle. This model is able to learn and reproduce pairwise statistics between neighboring note events in a given corpus. The model is also able to invent new chords and to harmonize unknown melodies. We evaluate the invention capacity of the model by assessing the amount of cited, re-discovered, and invented chords on a corpus of Bach chorales. We discuss how the model enables the user to specify and enforce user-defined constraints, which makes it useful for style-based, interactive music generation.