Efficient Fine-Grained Guidance for Diffusion Model Based Symbolic Music Generation
This work addresses the challenge of creating high-quality symbolic music for expert composers and interactive applications, representing an incremental improvement in guidance methods for diffusion models.
The paper tackles the problem of generating symbolic music with high precision in note pitch despite limited data, by introducing an efficient Fine-Grained Guidance (FGG) approach in diffusion models, resulting in improved accuracy, listenability, and quality of generated music as demonstrated through numerical experiments and subjective evaluation.
Developing generative models to create or conditionally create symbolic music presents unique challenges due to the combination of limited data availability and the need for high precision in note pitch. To address these challenges, we introduce an efficient Fine-Grained Guidance (FGG) approach within diffusion models. FGG guides the diffusion models to generate music that aligns more closely with the control and intent of expert composers, which is critical to improve the accuracy, listenability, and quality of generated music. This approach empowers diffusion models to excel in advanced applications such as improvisation, and interactive music creation. We derive theoretical characterizations for both the challenges in symbolic music generation and the effects of the FGG approach. We provide numerical experiments and subjective evaluation to demonstrate the effectiveness of our approach. We have published a demo page to showcase performances, which enables real-time interactive generation.