SDAILGMMASDec 5, 2022

MAP-Music2Vec: A Simple and Effective Baseline for Self-Supervised Music Audio Representation Learning

DeepMind
arXiv:2212.02508v132 citationsh-index: 42Has Code
Originality Incremental advance
AI Analysis

This work addresses the need for efficient self-supervised learning methods in music audio processing, offering a simpler and more parameter-efficient baseline for researchers and practitioners.

The paper tackles the problem of self-supervised representation learning for raw music waveforms by introducing Music2Vec, a framework that achieves results comparable to the state-of-the-art model Jukebox while using less than 2% of its parameters.

The deep learning community has witnessed an exponentially growing interest in self-supervised learning (SSL). However, it still remains unexplored how to build a framework for learning useful representations of raw music waveforms in a self-supervised manner. In this work, we design Music2Vec, a framework exploring different SSL algorithmic components and tricks for music audio recordings. Our model achieves comparable results to the state-of-the-art (SOTA) music SSL model Jukebox, despite being significantly smaller with less than 2% of parameters of the latter. The model will be released on Huggingface(Please refer to: https://huggingface.co/m-a-p/music2vec-v1)

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes