Unifying Symbolic Music Arrangement: Track-Aware Reconstruction and Structured Tokenization
This work addresses the need for flexible and efficient symbolic music arrangement tools for musicians and composers, though it is incremental as it builds on existing tokenization and modeling techniques.
The authors tackled the problem of automatic multitrack music arrangement by developing a unified framework that handles diverse scenarios like reinterpretation and simplification, achieving state-of-the-art performance in tasks such as band arrangement and piano reduction with improved objective metrics and perceptual evaluations.
We present a unified framework for automatic multitrack music arrangement that enables a single pre-trained symbolic music model to handle diverse arrangement scenarios, including reinterpretation, simplification, and additive generation. At its core is a segment-level reconstruction objective operating on token-level disentangled content and style, allowing for flexible any-to-any instrumentation transformations at inference time. To support track-wise modeling, we introduce REMI-z, a structured tokenization scheme for multitrack symbolic music that enhances modeling efficiency and effectiveness for both arrangement tasks and unconditional generation. Our method outperforms task-specific state-of-the-art models on representative tasks in different arrangement scenarios -- band arrangement, piano reduction, and drum arrangement, in both objective metrics and perceptual evaluations. Taken together, our framework demonstrates strong generality and suggests broader applicability in symbolic music-to-music transformation.