IRNov 26, 2021

Emotion Embedding Spaces for Matching Music to Stories

arXiv:2111.13468v120 citations
Originality Incremental advance
AI Analysis

This work helps content creators find music to enhance text-based stories, but it is incremental as it builds on existing emotion datasets and retrieval methods.

The paper tackled the problem of matching music to stories based on emotion by formalizing it as a cross-modal text-to-music retrieval task, and it achieved successful retrieval by leveraging emotion embedding spaces, including valence-arousal and data-driven methods, to bridge mismatched emotion vocabularies.

Content creators often use music to enhance their stories, as it can be a powerful tool to convey emotion. In this paper, our goal is to help creators find music to match the emotion of their story. We focus on text-based stories that can be auralized (e.g., books), use multiple sentences as input queries, and automatically retrieve matching music. We formalize this task as a cross-modal text-to-music retrieval problem. Both the music and text domains have existing datasets with emotion labels, but mismatched emotion vocabularies prevent us from using mood or emotion annotations directly for matching. To address this challenge, we propose and investigate several emotion embedding spaces, both manually defined (e.g., valence/arousal) and data-driven (e.g., Word2Vec and metric learning) to bridge this gap. Our experiments show that by leveraging these embedding spaces, we are able to successfully bridge the gap between modalities to facilitate cross modal retrieval. We show that our method can leverage the well established valence-arousal space, but that it can also achieve our goal via data-driven embedding spaces. By leveraging data-driven embeddings, our approach has the potential of being generalized to other retrieval tasks that require broader or completely different vocabularies.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes