LG SD ASJan 10, 2024

VI-PANN: Harnessing Transfer Learning and Uncertainty-Aware Variational Inference for Improved Generalization in Audio Pattern Recognition

John Fischer, Marko Orescanin, Eric Eckstrand

arXiv:2401.05531v24.67 citationsh-index: 3Has CodeIEEE Access

Originality Incremental advance

AI Analysis

This addresses the need for reliable uncertainty estimation in audio classification, particularly for tasks with limited data, though it is incremental as it adapts an existing method to a new context.

The paper tackles the problem of uncalibrated predictions and lack of epistemic uncertainty in deterministic transfer learning models for audio pattern recognition by proposing VI-PANNs, a variational inference variant of ResNet-54 pre-trained on AudioSet, and demonstrates that it can transfer calibrated uncertainty to downstream tasks like ESC-50, UrbanSound8K, and DCASE2013.

Transfer learning (TL) is an increasingly popular approach to training deep learning (DL) models that leverages the knowledge gained by training a foundation model on diverse, large-scale datasets for use on downstream tasks where less domain- or task-specific data is available. The literature is rich with TL techniques and applications; however, the bulk of the research makes use of deterministic DL models which are often uncalibrated and lack the ability to communicate a measure of epistemic (model) uncertainty in prediction. Unlike their deterministic counterparts, Bayesian DL (BDL) models are often well-calibrated, provide access to epistemic uncertainty for a prediction, and are capable of achieving competitive predictive performance. In this study, we propose variational inference pre-trained audio neural networks (VI-PANNs). VI-PANNs are a variational inference variant of the popular ResNet-54 architecture which are pre-trained on AudioSet, a large-scale audio event detection dataset. We evaluate the quality of the resulting uncertainty when transferring knowledge from VI-PANNs to other downstream acoustic classification tasks using the ESC-50, UrbanSound8K, and DCASE2013 datasets. We demonstrate, for the first time, that it is possible to transfer calibrated uncertainty information along with knowledge from upstream tasks to enhance a model's capability to perform downstream tasks.

View on arXiv PDF Code

Similar