SDCLASSep 5, 2022

Bridging Music and Text with Crowdsourced Music Comments: A Sequence-to-Sequence Framework for Thematic Music Comments Generation

arXiv:2209.01996v1h-index: 27
Originality Incremental advance
AI Analysis

This work addresses the problem of music description generation for applications in music recommendation and accessibility, but it is incremental as it builds on existing sequence-to-sequence frameworks with novel components for this specific domain.

The paper tackles the challenge of generating text descriptions for music by constructing a new dataset from crowdsourced comments and proposing a sequence-to-sequence model with dilated convolutional encoder and memory-based RNN decoder, enhanced by fine-tuning with a discriminator and topic evaluator, achieving fluent and meaningful comments as verified by new evaluation metrics.

We consider a novel task of automatically generating text descriptions of music. Compared with other well-established text generation tasks such as image caption, the scarcity of well-paired music and text datasets makes it a much more challenging task. In this paper, we exploit the crowd-sourced music comments to construct a new dataset and propose a sequence-to-sequence model to generate text descriptions of music. More concretely, we use the dilated convolutional layer as the basic component of the encoder and a memory based recurrent neural network as the decoder. To enhance the authenticity and thematicity of generated texts, we further propose to fine-tune the model with a discriminator as well as a novel topic evaluator. To measure the quality of generated texts, we also propose two new evaluation metrics, which are more aligned with human evaluation than traditional metrics such as BLEU. Experimental results verify that our model is capable of generating fluent and meaningful comments while containing thematic and content information of the original music.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes