SDLGMMASSPFeb 12, 2022

Deep Performer: Score-to-Audio Music Performance Synthesis

arXiv:2202.06034v226 citations
Originality Incremental advance
AI Analysis

This addresses music synthesis for applications like composition and education, but it is incremental as it adapts text-to-speech methods to music.

The paper tackles score-to-audio music performance synthesis by introducing Deep Performer, a transformer-based system that handles polyphony and long notes, achieving competitive quality on a violin dataset and significantly outperforming a baseline on a piano dataset.

Music performance synthesis aims to synthesize a musical score into a natural performance. In this paper, we borrow recent advances in text-to-speech synthesis and present the Deep Performer -- a novel system for score-to-audio music performance synthesis. Unlike speech, music often contains polyphony and long notes. Hence, we propose two new techniques for handling polyphonic inputs and providing a fine-grained conditioning in a transformer encoder-decoder model. To train our proposed system, we present a new violin dataset consisting of paired recordings and scores along with estimated alignments between them. We show that our proposed model can synthesize music with clear polyphony and harmonic structures. In a listening test, we achieve competitive quality against the baseline model, a conditional generative audio model, in terms of pitch accuracy, timbre and noise level. Moreover, our proposed model significantly outperforms the baseline on an existing piano dataset in overall quality.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes