CVOct 30, 2018

General audio tagging with ensembling convolutional neural network and statistical features

Kele Xu, Boqing Zhu, Qiuqiang Kong, Haibo Mi, Bo Ding, Dezhi Wang, Huaimin Wang

arXiv:1810.12832v11.734 citationsHas Code

Originality Incremental advance

AI Analysis

This work provides a competitive solution for audio tagging, addressing challenges like limited data and noisy labels in a specific domain challenge.

The paper tackled the DCASE 2018 Task 2 general audio tagging challenge by developing a system that combines convolutional neural networks, statistical features, ensemble learning, and sample re-weighting to address data limitations and noisy labels, achieving a mean average precision (mAP@3) of 0.958 and ranking 1st and 4th out of 558 submissions.

Audio tagging aims to infer descriptive labels from audio clips. Audio tagging is challenging due to the limited size of data and noisy labels. In this paper, we describe our solution for the DCASE 2018 Task 2 general audio tagging challenge. The contributions of our solution include: We investigated a variety of convolutional neural network architectures to solve the audio tagging task. Statistical features are applied to capture statistical patterns of audio features to improve the classification performance. Ensemble learning is applied to ensemble the outputs from the deep classifiers to utilize complementary information. a sample re-weight strategy is employed for ensemble training to address the noisy label problem. Our system achieves a mean average precision (mAP@3) of 0.958, outperforming the baseline system of 0.704. Our system ranked the 1st and 4th out of 558 submissions in the public and private leaderboard of DCASE 2018 Task 2 challenge. Our codes are available at https://github.com/Cocoxili/DCASE2018Task2/.

View on arXiv PDF Code

Similar