ASLGSDAug 30, 2020

Hierarchical Timbre-Painting and Articulation Generation

arXiv:2008.13095v212 citationsHas Code
AI Analysis

This addresses the challenge of realistic music synthesis for applications like audio production and creative tools, though it appears incremental as it builds on existing source-filtering and adversarial methods.

The paper tackles the problem of generating high-fidelity music audio that mimics the timbre and articulation of a target instrument from specified pitch and loudness inputs, achieving state-of-the-art timbre transfer with training on samples as short as a few minutes.

We present a fast and high-fidelity method for music generation, based on specified f0 and loudness, such that the synthesized audio mimics the timbre and articulation of a target instrument. The generation process consists of learned source-filtering networks, which reconstruct the signal at increasing resolutions. The model optimizes a multi-resolution spectral loss as the reconstruction loss, an adversarial loss to make the audio sound more realistic, and a perceptual f0 loss to align the output to the desired input pitch contour. The proposed architecture enables high-quality fitting of an instrument, given a sample that can be as short as a few minutes, and the method demonstrates state-of-the-art timbre transfer capabilities. Code and audio samples are shared at https://github.com/mosheman5/timbre_painting.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes