D3PIA: A Discrete Denoising Diffusion Model for Piano Accompaniment Generation From Lead sheet
This work addresses a domain-specific challenge in symbolic music generation for piano accompaniment, offering incremental improvements over existing methods.
The paper tackled the problem of generating piano accompaniments from lead sheets by proposing D3PIA, a discrete diffusion model that uses neighborhood attention to align melody and chord constraints, resulting in more faithful chord preservation and higher musical coherence compared to baselines.
Generating piano accompaniments in the symbolic music domain is a challenging task that requires producing a complete piece of piano music from given melody and chord constraints, such as those provided by a lead sheet. In this paper, we propose a discrete diffusion-based piano accompaniment generation model, D3PIA, leveraging local alignment between lead sheet and accompaniment in piano-roll representation. D3PIA incorporates Neighborhood Attention (NA) to both encode the lead sheet and condition it for predicting note states in the piano accompaniment. This design enhances local contextual modeling by efficiently attending to nearby melody and chord conditions. We evaluate our model using the POP909 dataset, a widely used benchmark for piano accompaniment generation. Objective evaluation results demonstrate that D3PIA preserves chord conditions more faithfully compared to continuous diffusion-based and Transformer-based baselines. Furthermore, a subjective listening test indicates that D3PIA generates more musically coherent accompaniments than the comparison models.