Persian MusicGen: A Large-Scale Dataset and Culturally-Aware Generative Model for Persian Music
Provides a resource and method for adapting music generation to underrepresented cultural contexts, addressing a gap in non-Western music AI.
The authors created the first large-scale Persian music dataset (900+ hours) and fine-tuned MusicGen to generate culturally aligned Persian music, outperforming the base model in stylistic adherence.
Persian music, with its unique tonalities, modal systems (Dastgah), and rhythmic structures, presents significant challenges for music generation models trained primarily on Western music. We address this gap by curating the first large-scale dataset of Persian songs, comprising over 900 hours high-quality audio samples across diverse sub-genres, including pop, traditional, and contemporary styles. This dataset captures the rich melodic and cultural diversity of Persian music and serves as the foundation for fine-tuning MusicGen, a state-of-the-art generative music model. We adapt MusicGen to this domain and evaluate its performance by utilizing subjective and objective metrics. To assess the semantic alignment between generated music and intended style tags, we report the proportion of relevant tags accurately reflected in the generated outputs. Our results demonstrate that the fine-tuned model produces compositions that more align with Persian stylistic conventions. This work introduces a new resource for generative music research and illustrates the adaptability of music generation models to underrepresented cultural and linguistic contexts.