SDDBIRLGASApr 27, 2021

MULTIMODAL ANALYSIS: Informed content estimation and audio source separation

arXiv:2104.13276v3
Originality Synthesis-oriented
AI Analysis

This work addresses multimodal analysis in music, but appears incremental as it builds on existing concepts of audio-text interaction.

This dissertation tackles the problem of analyzing musical signals by studying multimodal learning between audio and lyrics, focusing on source separation and informed content estimation.

This dissertation proposes the study of multimodal learning in the context of musical signals. Throughout, we focus on the interaction between audio signals and text information. Among the many text sources related to music that can be used (e.g. reviews, metadata, or social network feedback), we concentrate on lyrics. The singing voice directly connects the audio signal and the text information in a unique way, combining melody and lyrics where a linguistic dimension complements the abstraction of musical instruments. Our study focuses on the audio and lyrics interaction for targeting source separation and informed content estimation.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes