A Multi-modal Deep Neural Network approach to Bird-song identification
This work addresses bird song identification for ornithologists and ecologists, but it is incremental as it builds on existing multi-modal and deep learning methods.
The paper tackled bird song identification by developing a multi-modal deep neural network that processes audio and metadata, achieving 2nd, 3rd, and 4th ranks in the BirdCLEF2017 task.
We present a multi-modal Deep Neural Network (DNN) approach for bird song identification. The presented approach takes both audio samples and metadata as input. The audio is fed into a Convolutional Neural Network (CNN) using four convolutional layers. The additionally provided metadata is processed using fully connected layers. The flattened convolutional layers and the fully connected layer of the metadata are joined and fed into a fully connected layer. The resulting architecture achieved 2., 3. and 4. rank in the BirdCLEF2017 task in various training configurations.