SD LG ASFeb 7, 2022

Maximizing Audio Event Detection Model Performance on Small Datasets Through Knowledge Transfer, Data Augmentation, And Pretraining: An Ablation Study

arXiv:2202.03514v12.2

Originality Incremental advance

AI Analysis

This work addresses the challenge of limited data in audio event detection, though it is incremental as it builds on existing methods like Xception and ablation analysis.

The study tackled the problem of improving audio event detection on small datasets by combining knowledge transfer, pretraining, and data augmentation, achieving state-of-the-art accuracy on the ESC-50 dataset and developing a smaller model with nearly SOTA performance using a third of the parameters.

An Xception model reaches state-of-the-art (SOTA) accuracy on the ESC-50 dataset for audio event detection through knowledge transfer from ImageNet weights, pretraining on AudioSet, and an on-the-fly data augmentation pipeline. This paper presents an ablation study that analyzes which components contribute to the boost in performance and training time. A smaller Xception model is also presented which nears SOTA performance with almost a third of the parameters.

View on arXiv PDF

Similar