Jean-Pierre Briot

SD
11papers
806citations
Novelty17%
AI Score21

11 Papers

LGOct 26, 2023Code
miditok: A Python package for MIDI file tokenization

Nathan Fradet, Jean-Pierre Briot, Fabien Chhel et al.

Recent progress in natural language processing has been adapted to the symbolic music modality. Language models, such as Transformers, have been used with symbolic music for a variety of tasks among which music generation, modeling or transcription, with state-of-the-art performances. These models are beginning to be used in production products. To encode and decode music for the backbone model, they need to rely on tokenizers, whose role is to serialize music into sequences of distinct elements called tokens. MidiTok is an open-source library allowing to tokenize symbolic music with great flexibility and extended features. It features the most popular music tokenizations, under a unified API. It is made to be easily used and extensible for everyone.

LGJan 27, 2023
Byte Pair Encoding for Symbolic Music

Nathan Fradet, Nicolas Gutowski, Fabien Chhel et al.

When used with deep learning, the symbolic music modality is often coupled with language model architectures. To do so, the music needs to be tokenized, i.e. converted into a sequence of discrete tokens. This can be achieved by different approaches, as music can be composed of simultaneous tracks, of simultaneous notes with several attributes. Until now, the proposed tokenizations rely on small vocabularies of tokens describing the note attributes and time events, resulting in fairly long token sequences, and a sub-optimal use of the embedding space of language models. Recent research has put efforts on reducing the overall sequence length by merging embeddings or combining tokens. In this paper, we show that Byte Pair Encoding, a compression technique widely used for natural language, significantly decreases the sequence length while increasing the vocabulary size. By doing so, we leverage the embedding capabilities of such models with more expressive tokens, resulting in both better results and faster inference in generation and classification tasks. The source code is shared on Github, along with a companion website. Finally, BPE is directly implemented in MidiTok, allowing the reader to easily benefit from this method.

SDJul 4, 2022
An adaptive music generation architecture for games based on the deep learning Transformer mode

Gustavo Amaral Costa dos Santos, Augusto Baffa, Jean-Pierre Briot et al.

This paper presents an architecture for generating music for video games based on the Transformer deep learning model. Our motivation is to be able to customize the generation according to the taste of the player, who can select a corpus of training examples, corresponding to his preferred musical style. The system generates various musical layers, following the standard layering strategy currently used by composers designing video game music. To adapt the music generated to the game play and to the player(s) situation, we are using an arousal-valence model of emotions, in order to control the selection of musical layers. We discuss current limitations and prospects for the future, such as collaborative and interactive control of the musical components.

SDOct 12, 2023
Impact of time and note duration tokenizations on deep learning symbolic music modeling

Nathan Fradet, Nicolas Gutowski, Fabien Chhel et al.

Symbolic music is widely used in various deep learning tasks, including generation, transcription, synthesis, and Music Information Retrieval (MIR). It is mostly employed with discrete models like Transformers, which require music to be tokenized, i.e., formatted into sequences of distinct elements called tokens. Tokenization can be performed in different ways. As Transformer can struggle at reasoning, but capture more easily explicit information, it is important to study how the way the information is represented for such model impact their performances. In this work, we analyze the common tokenization methods and experiment with time and note duration representations. We compare the performances of these two impactful criteria on several tasks, including composer and emotion classification, music generation, and sequence representation learning. We demonstrate that explicit information leads to better results depending on the task.

SDJul 20, 2021
Music Tempo Estimation via Neural Networks -- A Comparative Analysis

Mila Soares de Oliveira de Souza, Pedro Nuno de Souza Moura, Jean-Pierre Briot · mila

This paper presents a comparative analysis on two artificial neural networks (with different architectures) for the task of tempo estimation. For this purpose, it also proposes the modeling, training and evaluation of a B-RNN (Bidirectional Recurrent Neural Network) model capable of estimating tempo in bpm (beats per minutes) of musical pieces, without using external auxiliary modules. An extensive database (12,550 pieces in total) was curated to conduct a quantitative and qualitative analysis over the experiment. Percussion-only tracks were also included in the dataset. The performance of the B-RNN is compared to that of state-of-the-art models. For further comparison, a state-of-the-art CNN was also retrained with the same datasets used for the B-RNN training. Evaluation results for each model and datasets are presented and discussed, as well as observations and ideas for future research. Tempo estimation was more accurate for the percussion only dataset, suggesting that the estimation can be more accurate for percussion-only tracks, although further experiments (with more of such datasets) should be made to gather stronger evidence.

SEDec 8, 2021
From Procedures, Objects, Actors, Components, Services, to Agents -- A Comparative Analysis of the History and Evolution of Programming Abstractions

Jean-Pierre Briot

The objective of this chapter is to propose some retrospective analysis of the evolution of programming abstractions, from {\em procedures}, {\em objects}, {\em actors}, {\em components}, {\em services}, up to {\em agents}, %have some compare concepts of software component and of agent (and multi-agent system), %The method chosen is to by replacing them within a general historical perspective. Some common referential with three axes/dimensions is chosen: {\em action selection} at the level of one entity, {\em coupling flexibility} between entities, and {\em abstraction level}. We indeed may observe some continuous quest for higher flexibility (through notions such as {\em late binding}, or {\em reification} of {\em connections}) and higher level of {\em abstraction}. Concepts of components, services and agents have some common objectives (notably, {\em software modularity and reconfigurability}), with multi-agent systems raising further concepts of {\em autonomy} and {\em coordination}. notably through the notion of {\em auto-organization} and the use of {\em knowledge}. We hope that this analysis helps at highlighting some of the basic forces motivating the progress of programming abstractions and therefore that it may provide some seeds for the reflection about future programming abstractions.

LGJul 6, 2021
Energy Consumption of Deep Generative Audio Models

Constance Douwes, Philippe Esling, Jean-Pierre Briot

In most scientific domains, the deep learning community has largely focused on the quality of deep generative models, resulting in highly accurate and successful solutions. However, this race for quality comes at a tremendous computational cost, which incurs vast energy consumption and greenhouse gas emissions. At the heart of this problem are the measures that we use as a scientific community to evaluate our work. In this paper, we suggest relying on a multi-objective measure based on Pareto optimality, which takes into account both the quality of the model and its energy consumption. By applying our measure on the current state-of-the-art in generative audio models, we show that it can drastically change the significance of the results. We believe that this type of metric can be widely used by the community to evaluate their work, while putting computational cost -- and in fine energy consumption -- in the spotlight of deep learning research.

ASApr 7, 2020
From Artificial Neural Networks to Deep Learning for Music Generation -- History, Concepts and Trends

Jean-Pierre Briot

The current wave of deep learning (the hyper-vitamined return of artificial neural networks) applies not only to traditional statistical machine learning tasks: prediction and classification (e.g., for weather prediction and pattern recognition), but has already conquered other areas, such as translation. A growing area of application is the generation of creative content, notably the case of music, the topic of this paper. The motivation is in using the capacity of modern deep learning techniques to automatically learn musical styles from arbitrary musical corpora and then to generate musical samples from the estimated distribution, with some degree of control over the generation. This paper provides a tutorial on music generation based on deep learning techniques. After a short introduction to the topic illustrated by a recent exemple, the paper analyzes some early works from the late 1980s using artificial neural networks for music generation and how their pioneering contributions have prefigured current techniques. Then, we introduce some conceptual framework to analyze the various concepts and dimensions involved. Various examples of recent systems are introduced and analyzed to illustrate the variety of concerns and of techniques.

SDDec 9, 2017
Music Generation by Deep Learning - Challenges and Directions

Jean-Pierre Briot, François Pachet

In addition to traditional tasks such as prediction, classification and translation, deep learning is receiving growing attention as an approach for music generation, as witnessed by recent research groups such as Magenta at Google and CTRL (Creator Technology Research Lab) at Spotify. The motivation is in using the capacity of deep learning architectures and training techniques to automatically learn musical styles from arbitrary musical corpora and then to generate samples from the estimated distribution. However, a direct application of deep learning to generate content rapidly reaches limits as the generated content tends to mimic the training set without exhibiting true creativity. Moreover, deep learning architectures do not offer direct ways for controlling generation (e.g., imposing some tonality or other arbitrary constraints). Furthermore, deep learning architectures alone are autistic automata which generate music autonomously without human user interaction, far from the objective of interactively assisting musicians to compose and refine music. Issues such as: control, structure, creativity and interactivity are the focus of our analysis. In this paper, we select some limitations of a direct application of deep learning to music generation, analyze why the issues are not fulfilled and how to address them by possible approaches. Various examples of recent systems are cited as examples of promising directions.

SDSep 5, 2017
Deep Learning Techniques for Music Generation -- A Survey

Jean-Pierre Briot, Gaëtan Hadjeres, François-David Pachet

This paper is a survey and an analysis of different ways of using deep learning (deep artificial neural networks) to generate musical content. We propose a methodology based on five dimensions for our analysis: Objective - What musical content is to be generated? Examples are: melody, polyphony, accompaniment or counterpoint. - For what destination and for what use? To be performed by a human(s) (in the case of a musical score), or by a machine (in the case of an audio file). Representation - What are the concepts to be manipulated? Examples are: waveform, spectrogram, note, chord, meter and beat. - What format is to be used? Examples are: MIDI, piano roll or text. - How will the representation be encoded? Examples are: scalar, one-hot or many-hot. Architecture - What type(s) of deep neural network is (are) to be used? Examples are: feedforward network, recurrent network, autoencoder or generative adversarial networks. Challenge - What are the limitations and open challenges? Examples are: variability, interactivity and creativity. Strategy - How do we model and control the process of generation? Examples are: single-step feedforward, iterative feedforward, sampling or input manipulation. For each dimension, we conduct a comparative analysis of various models and techniques and we propose some tentative multidimensional typology. This typology is bottom-up, based on the analysis of many existing deep-learning based systems for music generation selected from the relevant literature. These systems are described and are used to exemplify the various choices of objective, representation, architecture, challenge and strategy. The last section includes some discussion and some prospects.

CLMay 3, 2017
A Hybrid Architecture for Multi-Party Conversational Systems

Maira Gatti de Bayser, Paulo Cavalin, Renan Souza et al.

Multi-party Conversational Systems are systems with natural language interaction between one or more people or systems. From the moment that an utterance is sent to a group, to the moment that it is replied in the group by a member, several activities must be done by the system: utterance understanding, information search, reasoning, among others. In this paper we present the challenges of designing and building multi-party conversational systems, the state of the art, our proposed hybrid architecture using both rules and machine learning and some insights after implementing and evaluating one on the finance domain.