Densely Connected CNNs for Bird Audio Detection
This work addresses the problem of automating bird sound detection for bioacoustics researchers, but it is incremental as it applies existing neural network architectures to a specific challenge dataset.
The paper tackled bird audio detection by evaluating convolutional neural networks, finding that DenseNets achieved the best performance with an 88.22% AUC score on a test set, using techniques like data augmentation and ensemble methods.
Detecting bird sounds in audio recordings automatically, if accurate enough, is expected to be of great help to the research community working in bio- and ecoacoustics, interested in monitoring biodiversity based on audio field recordings. To estimate how accurate the state-of-the-art machine learning approaches are, the Bird Audio Detection challenge involving large audio datasets was recently organized. In this paper, experiments using several types of convolutional neural networks (i.e. standard CNNs, residual nets and densely connected nets) are reported in the framework of this challenge. DenseNets were the preferred solution since they were the best performing and most compact models, leading to a 88.22% area under the receiver operator curve score on the test set of the challenge. Performance gains were obtained thank to data augmentation through time and frequency shifting, model parameter averaging during training and ensemble methods using the geometric mean. On the contrary, the attempts to enlarge the training dataset with samples of the test set with automatic predictions used as pseudo-groundtruth labels consistently degraded performance.