IRJan 15, 2020
Deep Learning for MIR TutorialAlexander Schindler, Thomas Lidy, Sebastian Böck
Deep Learning has become state of the art in visual computing and continuously emerges into the Music Information Retrieval (MIR) and audio retrieval domain. In order to bring attention to this topic we propose an introductory tutorial on deep learning for MIR. Besides a general introduction to neural networks, the proposed tutorial covers a wide range of MIR relevant deep learning approaches. \textbf{Convolutional Neural Networks} are currently a de-facto standard for deep learning based audio retrieval. \textbf{Recurrent Neural Networks} have proven to be effective in onset detection tasks such as beat or audio-event detection. \textbf{Siamese Networks} have been shown effective in learning audio representations and distance functions specific for music similarity retrieval. We will incorporate both academic and industrial points of view into the tutorial. Accompanying the tutorial, we will create a Github repository for the content presented at the tutorial as well as references to state of the art work and literature for further reading. This repository will remain public after the conference.
SDNov 11, 2018
A Multi-modal Deep Neural Network approach to Bird-song identificationBotond Fazeka, Alexander Schindler, Thomas Lidy et al.
We present a multi-modal Deep Neural Network (DNN) approach for bird song identification. The presented approach takes both audio samples and metadata as input. The audio is fed into a Convolutional Neural Network (CNN) using four convolutional layers. The additionally provided metadata is processed using fully connected layers. The flattened convolutional layers and the fully connected layer of the metadata are joined and fed into a fully connected layer. The resulting architecture achieved 2., 3. and 4. rank in the BirdCLEF2017 task in various training configurations.
SDNov 11, 2018
Multi-Temporal Resolution Convolutional Neural Networks for Acoustic Scene ClassificationAlexander Schindler, Thomas Lidy, Andreas Rauber
In this paper we present a Deep Neural Network architecture for the task of acoustic scene classification which harnesses information from increasing temporal resolutions of Mel-Spectrogram segments. This architecture is composed of separated parallel Convolutional Neural Networks which learn spectral and temporal representations for each input resolution. The resolutions are chosen to cover fine-grained characteristics of a scene's spectral texture as well as its distribution of acoustic events. The proposed model shows a 3.56% absolute improvement of the best performing single resolution model and 12.49% of the DCASE 2017 Acoustic Scenes Classification task baseline.
CVNov 11, 2018
Fashion and Apparel Classification using Convolutional Neural NetworksAlexander Schindler, Thomas Lidy, Stephan Karner et al.
We present an empirical study of applying deep Convolutional Neural Networks (CNN) to the task of fashion and apparel image classification to improve meta-data enrichment of e-commerce applications. Five different CNN architectures were analyzed using clean and pre-trained models. The models were evaluated in three different tasks person detection, product and gender classification, on two small and large scale datasets.