SD AI ASFeb 27, 2023

HalluAudio: Hallucinating Frequency as Concepts for Few-Shot Audio Classification

Zhongjie Yu, Shuyang Wang, Lin Chen, Zhongwei Cheng

arXiv:2302.14204v14.26 citationsh-index: 13

Originality Incremental advance

AI Analysis

This work addresses few-shot audio classification for researchers, offering an interpretable method that improves over existing approaches, though it appears incremental in focusing on audio-specific formats.

The authors tackled few-shot audio classification by hallucinating high- and low-frequency parts as structured concepts, leveraging audio spectrogram specificity, and achieved notable performance gains over baselines on ESC-50 and a curated Kaggle18 dataset.

Few-shot audio classification is an emerging topic that attracts more and more attention from the research community. Most existing work ignores the specificity of the form of the audio spectrogram and focuses largely on the embedding space borrowed from image tasks, while in this work, we aim to take advantage of this special audio format and propose a new method by hallucinating high-frequency and low-frequency parts as structured concepts. Extensive experiments on ESC-50 and our curated balanced Kaggle18 dataset show the proposed method outperforms the baseline by a notable margin. The way that our method hallucinates high-frequency and low-frequency parts also enables its interpretability and opens up new potentials for the few-shot audio classification.

View on arXiv PDF

Similar