TOMI: Transforming and Organizing Music Ideas for Multi-Track Compositions with Full-Song Structure
This addresses the challenge of structured music generation for composers and AI researchers, representing an incremental advance in deep music generation.
The paper tackled the problem of generating multi-track electronic music with full-song structure by introducing TOMI, a novel approach based on concept hierarchy and instruction-tuned LLMs, which produced higher-quality music with stronger structural coherence compared to baselines.
Hierarchical planning is a powerful approach to model long sequences structurally. Aside from considering hierarchies in the temporal structure of music, this paper explores an even more important aspect: concept hierarchy, which involves generating music ideas, transforming them, and ultimately organizing them--across musical time and space--into a complete composition. To this end, we introduce TOMI (Transforming and Organizing Music Ideas) as a novel approach in deep music generation and develop a TOMI-based model via instruction-tuned foundation LLM. Formally, we represent a multi-track composition process via a sparse, four-dimensional space characterized by clips (short audio or MIDI segments), sections (temporal positions), tracks (instrument layers), and transformations (elaboration methods). Our model is capable of generating multi-track electronic music with full-song structure, and we further integrate the TOMI-based model with the REAPER digital audio workstation, enabling interactive human-AI co-creation. Experimental results demonstrate that our approach produces higher-quality electronic music with stronger structural coherence compared to baselines.