SDAICLDec 1, 2025

Story2MIDI: Emotionally Aligned Music Generation from Text

arXiv:2512.02192v1h-index: 1
Originality Synthesis-oriented
AI Analysis

This work addresses the challenge of emotionally aligned music generation for applications in creative AI, though it is incremental due to reliance on existing datasets and methods.

The paper tackles the problem of generating music that aligns with emotions from text by introducing Story2MIDI, a Transformer-based model, and constructs a dataset from existing sentiment and emotion datasets; results show the model learns emotion-relevant features and produces diverse emotional responses, confirmed by objective metrics and human evaluation.

In this paper, we introduce Story2MIDI, a sequence-to-sequence Transformer-based model for generating emotion-aligned music from a given piece of text. To develop this model, we construct the Story2MIDI dataset by merging existing datasets for sentiment analysis from text and emotion classification in music. The resulting dataset contains pairs of text blurbs and music pieces that evoke the same emotions in the reader or listener. Despite the small scale of our dataset and limited computational resources, our results indicate that our model effectively learns emotion-relevant features in music and incorporates them into its generation process, producing samples with diverse emotional responses. We evaluate the generated outputs using objective musical metrics and a human listening study, confirming the model's ability to capture intended emotional cues.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes