SDAILGASSPSep 19, 2024

ViolinDiff: Enhancing Expressive Violin Synthesis with Pitch Bend Conditioning

arXiv:2409.12477v21 citationsh-index: 2
AI Analysis

This addresses the problem of synthesizing expressive polyphonic violin music for audio generation applications, but it is incremental as it builds on existing diffusion methods with a specific conditioning approach.

The paper tackles the challenge of modeling fundamental frequency (F0) contours for expressive violin synthesis in polyphonic music, and the result is a two-stage diffusion-based framework that generates more realistic violin sounds, as shown by quantitative metrics and listening tests.

Modeling the natural contour of fundamental frequency (F0) plays a critical role in music audio synthesis. However, transcribing and managing multiple F0 contours in polyphonic music is challenging, and explicit F0 contour modeling has not yet been explored for polyphonic instrumental synthesis. In this paper, we present ViolinDiff, a two-stage diffusion-based synthesis framework. For a given violin MIDI file, the first stage estimates the F0 contour as pitch bend information, and the second stage generates mel spectrogram incorporating these expressive details. The quantitative metrics and listening test results show that the proposed model generates more realistic violin sounds than the model without explicit pitch bend modeling. Audio samples are available online: daewoung.github.io/ViolinDiff-Demo.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes