SD AI ASJan 7, 2025

LHGNN: Local-Higher Order Graph Neural Networks For Audio Classification and Tagging

Shubhr Singh, Emmanouil Benetos, Huy Phan, Dan Stowell

arXiv:2501.03464v29.35 citationsh-index: 32ICASSP

Originality Incremental advance

AI Analysis

This addresses audio classification and tagging problems for researchers and practitioners, offering an efficient alternative to Transformers in data-scarce environments, though it appears incremental as it builds on existing graph neural network concepts.

The paper tackles the limitation of Transformers in capturing higher-order relations for audio object identification by introducing LHGNN, a graph-based model that integrates local neighborhood information with higher-order data from Fuzzy C-Means clusters. The model outperforms Transformer-based models across three audio datasets with fewer parameters and shows advantages without ImageNet pretraining.

Transformers have set new benchmarks in audio processing tasks, leveraging self-attention mechanisms to capture complex patterns and dependencies within audio data. However, their focus on pairwise interactions limits their ability to process the higher-order relations essential for identifying distinct audio objects. To address this limitation, this work introduces the Local- Higher Order Graph Neural Network (LHGNN), a graph based model that enhances feature understanding by integrating local neighbourhood information with higher-order data from Fuzzy C-Means clusters, thereby capturing a broader spectrum of audio relationships. Evaluation of the model on three publicly available audio datasets shows that it outperforms Transformer-based models across all benchmarks while operating with substantially fewer parameters. Moreover, LHGNN demonstrates a distinct advantage in scenarios lacking ImageNet pretraining, establishing its effectiveness and efficiency in environments where extensive pretraining data is unavailable.

View on arXiv PDF

Similar