SD AISep 12, 2025

Improving Audio Event Recognition with Consistency Regularization

arXiv:2509.10391v14.0

Originality Incremental advance

AI Analysis

This work addresses audio event recognition for applications like sound classification, but it is incremental as it adapts an existing regularization technique to a new domain.

The paper tackled audio event recognition by applying consistency regularization to improve model performance on AudioSet, showing consistent gains over supervised baselines with up to 1.8M samples and additional improvements in semi-supervised setups.

Consistency regularization (CR), which enforces agreement between model predictions on augmented views, has found recent benefits in automatic speech recognition [1]. In this paper, we propose the use of consistency regularization for audio event recognition, and demonstrate its effectiveness on AudioSet. With extensive ablation studies for both small ($\sim$20k) and large ($\sim$1.8M) supervised training sets, we show that CR brings consistent improvement over supervised baselines which already heavily utilize data augmentation, and CR using stronger augmentation and multiple augmentations leads to additional gain for the small training set. Furthermore, we extend the use of CR into the semi-supervised setup with 20K labeled samples and 1.8M unlabeled samples, and obtain performance improvement over our best model trained on the small set.

View on arXiv PDF

Similar