SDCLLGASNov 9, 2020

Knowledge Distillation for Singing Voice Detection

arXiv:2011.04297v2
AI Analysis

This addresses the practical deployment issue for music information retrieval applications on smartphones or embedded sensors, but it is incremental as it adapts existing knowledge distillation methods to a new domain.

The paper tackles the problem of deploying large deep neural networks for Singing Voice Detection on resource-constrained devices by applying knowledge distillation techniques, achieving model compression with a smaller student network trained from a teacher network.

Singing Voice Detection (SVD) has been an active area of research in music information retrieval (MIR). Currently, two deep neural network-based methods, one based on CNN and the other on RNN, exist in literature that learn optimized features for the voice detection (VD) task and achieve state-of-the-art performance on common datasets. Both these models have a huge number of parameters (1.4M for CNN and 65.7K for RNN) and hence not suitable for deployment on devices like smartphones or embedded sensors with limited capacity in terms of memory and computation power. The most popular method to address this issue is known as knowledge distillation in deep learning literature (in addition to model compression) where a large pre-trained network known as the teacher is used to train a smaller student network. Given the wide applications of SVD in music information retrieval, to the best of our knowledge, model compression for practical deployment has not yet been explored. In this paper, efforts have been made to investigate this issue using both conventional as well as ensemble knowledge distillation techniques.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes