SDAIASJan 7, 2025

LHGNN: Local-Higher Order Graph Neural Networks For Audio Classification and Tagging

arXiv:2501.03464v25 citationsh-index: 42ICASSP
AI Analysis

This addresses audio classification and tagging problems for researchers and practitioners, offering an efficient alternative to Transformers in data-scarce environments, though it appears incremental as it builds on existing graph neural network concepts.

The paper tackles the limitation of Transformers in capturing higher-order relations for audio object identification by introducing LHGNN, a graph-based model that integrates local neighborhood information with higher-order data from Fuzzy C-Means clusters. The model outperforms Transformer-based models across three audio datasets with fewer parameters and shows advantages without ImageNet pretraining.

Transformers have set new benchmarks in audio processing tasks, leveraging self-attention mechanisms to capture complex patterns and dependencies within audio data. However, their focus on pairwise interactions limits their ability to process the higher-order relations essential for identifying distinct audio objects. To address this limitation, this work introduces the Local- Higher Order Graph Neural Network (LHGNN), a graph based model that enhances feature understanding by integrating local neighbourhood information with higher-order data from Fuzzy C-Means clusters, thereby capturing a broader spectrum of audio relationships. Evaluation of the model on three publicly available audio datasets shows that it outperforms Transformer-based models across all benchmarks while operating with substantially fewer parameters. Moreover, LHGNN demonstrates a distinct advantage in scenarios lacking ImageNet pretraining, establishing its effectiveness and efficiency in environments where extensive pretraining data is unavailable.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes