SDLGASFeb 5, 2021

Multi-Task Self-Supervised Pre-Training for Music Classification

arXiv:2102.03229v139 citations
Originality Incremental advance
AI Analysis

This work aims to improve music classification for researchers and practitioners by mitigating the need for extensive labeled datasets, which are costly and time-consuming to acquire.

This paper addresses the challenge of limited labeled data in music classification by applying multi-task self-supervised pre-training. The authors explore different encoder architectures, loss weighting mechanisms, and pretext task selections, finding that music-specific workers combined with loss balancing improve generalization to downstream tasks.

Deep learning is very data hungry, and supervised learning especially requires massive labeled data to work well. Machine listening research often suffers from limited labeled data problem, as human annotations are costly to acquire, and annotations for audio are time consuming and less intuitive. Besides, models learned from labeled dataset often embed biases specific to that particular dataset. Therefore, unsupervised learning techniques become popular approaches in solving machine listening problems. Particularly, a self-supervised learning technique utilizing reconstructions of multiple hand-crafted audio features has shown promising results when it is applied to speech domain such as emotion recognition and automatic speech recognition (ASR). In this paper, we apply self-supervised and multi-task learning methods for pre-training music encoders, and explore various design choices including encoder architectures, weighting mechanisms to combine losses from multiple tasks, and worker selections of pretext tasks. We investigate how these design choices interact with various downstream music classification tasks. We find that using various music specific workers altogether with weighting mechanisms to balance the losses during pre-training helps improve and generalize to the downstream tasks.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes