LGSDASMLMay 6, 2019

Zero-Shot Audio Classification Based on Class Label Embeddings

arXiv:1905.01926v232 citations
Originality Synthesis-oriented
AI Analysis

This work addresses the problem of classifying audio without labeled examples for new classes, which is incremental as it adapts existing bilinear models and embeddings to audio.

The paper tackles zero-shot audio classification by using textual class label embeddings without audio samples from target classes, achieving an average accuracy of 26% on the ESC-50 dataset, which is better than random guess (10%), with up to 39.7% for natural audio classes.

This paper proposes a zero-shot learning approach for audio classification based on the textual information about class labels without any audio samples from target classes. We propose an audio classification system built on the bilinear model, which takes audio feature embeddings and semantic class label embeddings as input, and measures the compatibility between an audio feature embedding and a class label embedding. We use VGGish to extract audio feature embeddings from audio recordings. We treat textual labels as semantic side information of audio classes, and use Word2Vec to generate class label embeddings. Results on the ESC-50 dataset show that the proposed system can perform zero-shot audio classification with small training dataset. It can achieve accuracy (26 % on average) better than random guess (10 %) on each audio category. Particularly, it reaches up to 39.7 % for the category of natural audio classes.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes