LabelBuddy: An Open Source Music and Audio Language Annotation Tagging Tool Using AI Assistance
This tool provides a solution for researchers and developers in Music Information Retrieval (MIR) to create richer, human-aligned representations for audio, which is crucial for advancing machine learning and large audio language models.
This paper introduces LabelBuddy, an open-source collaborative audio annotation tool that addresses the scarcity of infrastructure for capturing subjective nuances in audio annotation. It allows users to integrate custom AI models for pre-annotation by decoupling the interface from inference through containerized backends.
The advancement of Machine learning (ML), Large Audio Language Models (LALMs), and autonomous AI agents in Music Information Retrieval (MIR) necessitates a shift from static tagging to rich, human-aligned representation learning. However, the scarcity of open-source infrastructure capable of capturing the subjective nuances of audio annotation remains a critical bottleneck. This paper introduces \textbf{LabelBuddy}, an open-source collaborative auto-tagging audio annotation tool designed to bridge the gap between human intent and machine understanding. Unlike static tools, it decouples the interface from inference via containerized backends, allowing users to plug in custom models for AI-assisted pre-annotation. We describe the system architecture, which supports multi-user consensus, containerized model isolation, and a roadmap for extending agents and LALMs. Code available at https://github.com/GiannisProkopiou/gsoc2022-Label-buddy.