Multi-Level and Multi-Scale Feature Aggregation Using Sample-level Deep Convolutional Neural Networks for Music Classification
This work addresses music classification for applications like tagging and recommendation, but it appears incremental as it builds on existing feature aggregation methods.
The authors tackled the problem of music classification by aggregating multi-level and multi-scale features from sample-level deep convolutional neural networks trained on raw waveforms, achieving state-of-the-art results on several datasets.
Music tag words that describe music audio by text have different levels of abstraction. Taking this issue into account, we propose a music classification approach that aggregates multi-level and multi-scale features using pre-trained feature extractors. In particular, the feature extractors are trained in sample-level deep convolutional neural networks using raw waveforms. We show that this approach achieves state-of-the-art results on several music classification datasets.