MixAssist: An Audio-Language Dataset for Co-Creative AI Assistance in Music Mixing
This addresses a gap for amateur music producers seeking co-creative AI assistance, though it is incremental as it builds on existing audio-language models.
The authors tackled the lack of collaborative AI tools for music mixing by introducing MixAssist, a dataset of audio-language dialogues from expert-amateur sessions, and showed that fine-tuning models like Qwen-Audio on it yields promising results, with Qwen significantly outperforming others in generating helpful advice.
While AI presents significant potential for enhancing music mixing and mastering workflows, current research predominantly emphasizes end-to-end automation or generation, often overlooking the collaborative and instructional dimensions vital for co-creative processes. This gap leaves artists, particularly amateurs seeking to develop expertise, underserved. To bridge this, we introduce MixAssist, a novel audio-language dataset capturing the situated, multi-turn dialogue between expert and amateur music producers during collaborative mixing sessions. Comprising 431 audio-grounded conversational turns derived from 7 in-depth sessions involving 12 producers, MixAssist provides a unique resource for training and evaluating audio-language models that can comprehend and respond to the complexities of real-world music production dialogues. Our evaluations, including automated LLM-as-a-judge assessments and human expert comparisons, demonstrate that fine-tuning models such as Qwen-Audio on MixAssist can yield promising results, with Qwen significantly outperforming other tested models in generating helpful, contextually relevant mixing advice. By focusing on co-creative instruction grounded in audio context, MixAssist enables the development of intelligent AI assistants designed to support and augment the creative process in music mixing.