SD ASNov 26, 2021

Semi-Supervised Music Tagging Transformer

arXiv:2111.13457v121.052 citationsHas Code

Originality Incremental advance

AI Analysis

This work addresses music tagging for applications like recommendation systems, but it is incremental as it adapts existing transformer and semi-supervised methods to this domain.

The paper tackles music tagging by proposing a transformer-based model that combines convolutional layers for local acoustic features with self-attention for temporal summarization, achieving state-of-the-art performance under supervised training and further improving it through semi-supervised noisy student training, with results showing it outperforms previous CNN-based models.

We present Music Tagging Transformer that is trained with a semi-supervised approach. The proposed model captures local acoustic characteristics in shallow convolutional layers, then temporally summarizes the sequence of the extracted features using stacked self-attention layers. Through a careful model assessment, we first show that the proposed architecture outperforms the previous state-of-the-art music tagging models that are based on convolutional neural networks under a supervised scheme. The Music Tagging Transformer is further improved by noisy student training, a semi-supervised approach that leverages both labeled and unlabeled data combined with data augmentation. To our best knowledge, this is the first attempt to utilize the entire audio of the million song dataset.

View on arXiv PDF Code

Similar