ASAISDJul 19, 2022

GAFX: A General Audio Feature eXtractor

arXiv:2207.09145v12 citationsh-index: 8
Originality Incremental advance
AI Analysis

This work addresses the need for more flexible audio features in machine learning, though it is incremental as it builds on existing architectures like U-Net and ResNet.

The paper tackled the problem of whether deep learning-based features can replace spectrograms in audio tasks, and found that their proposed General Audio Feature eXtractor (GAFX) models, particularly GAFX-U, achieved competitive performance on music genre classification using the GTZAN dataset.

Most machine learning models for audio tasks are dealing with a handcrafted feature, the spectrogram. However, it is still unknown whether the spectrogram could be replaced with deep learning based features. In this paper, we answer this question by comparing the different learnable neural networks extracting features with a successful spectrogram model and proposed a General Audio Feature eXtractor (GAFX) based on a dual U-Net (GAFX-U), ResNet (GAFX-R), and Attention (GAFX-A) modules. We design experiments to evaluate this model on the music genre classification task on the GTZAN dataset and perform a detailed ablation study of different configurations of our framework and our model GAFX-U, following the Audio Spectrogram Transformer (AST) classifier achieves competitive performance.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes