SDLGASMar 26, 2025

Hierarchical Label Propagation: A Model-Size-Dependent Performance Booster for AudioSet Tagging

arXiv:2503.21826v12 citationsh-index: 3ICASSP
Originality Incremental advance
AI Analysis

This addresses annotation inconsistencies in audio tagging datasets like AudioSet, providing a method to boost model performance, particularly for smaller models, but it is incremental as it builds on existing label propagation techniques.

The paper tackled inconsistent annotations in AudioSet by applying Hierarchical Label Propagation (HLP) to propagate labels up the ontology hierarchy, resulting in a mean increase in positive labels per audio clip from 1.98 to 2.39 and performance improvements across various model architectures, with smaller models showing more gains.

AudioSet is one of the most used and largest datasets in audio tagging, containing about 2 million audio samples that are manually labeled with 527 event categories organized into an ontology. However, the annotations contain inconsistencies, particularly where categories that should be labeled as positive according to the ontology are frequently mislabeled as negative. To address this issue, we apply Hierarchical Label Propagation (HLP), which propagates labels up the ontology hierarchy, resulting in a mean increase in positive labels per audio clip from 1.98 to 2.39 and affecting 109 out of the 527 classes. Our results demonstrate that HLP provides performance benefits across various model architectures, including convolutional neural networks (PANN's CNN6 and ConvNeXT) and transformers (PaSST), with smaller models showing more improvements. Finally, on FSD50K, another widely used dataset, models trained on AudioSet with HLP consistently outperformed those trained without HLP. Our source code will be made available on GitHub.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes