SDAISep 12, 2025

Improving Audio Event Recognition with Consistency Regularization

arXiv:2509.10391v1
Originality Incremental advance
AI Analysis

This work addresses audio event recognition for applications like sound classification, but it is incremental as it adapts an existing regularization technique to a new domain.

The paper tackled audio event recognition by applying consistency regularization to improve model performance on AudioSet, showing consistent gains over supervised baselines with up to 1.8M samples and additional improvements in semi-supervised setups.

Consistency regularization (CR), which enforces agreement between model predictions on augmented views, has found recent benefits in automatic speech recognition [1]. In this paper, we propose the use of consistency regularization for audio event recognition, and demonstrate its effectiveness on AudioSet. With extensive ablation studies for both small ($\sim$20k) and large ($\sim$1.8M) supervised training sets, we show that CR brings consistent improvement over supervised baselines which already heavily utilize data augmentation, and CR using stronger augmentation and multiple augmentations leads to additional gain for the small training set. Furthermore, we extend the use of CR into the semi-supervised setup with 20K labeled samples and 1.8M unlabeled samples, and obtain performance improvement over our best model trained on the small set.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes