ASAILGSDDec 27, 2025

Geometry-Aware Optimization for Respiratory Sound Classification: Enhancing Sensitivity with SAM-Optimized Audio Spectrogram Transformers

arXiv:2512.22564v1h-index: 14
Originality Incremental advance
AI Analysis

This improves clinical screening for respiratory conditions by boosting sensitivity on a challenging medical dataset, though it is an incremental advance combining existing techniques.

The paper tackled respiratory sound classification on the noisy, imbalanced ICBHI 2017 dataset by enhancing an Audio Spectrogram Transformer with Sharpness-Aware Minimization to find flatter loss minima, achieving a state-of-the-art score of 68.10% and sensitivity of 68.31%.

Respiratory sound classification is hindered by the limited size, high noise levels, and severe class imbalance of benchmark datasets like ICBHI 2017. While Transformer-based models offer powerful feature extraction capabilities, they are prone to overfitting and often converge to sharp minima in the loss landscape when trained on such constrained medical data. To address this, we introduce a framework that enhances the Audio Spectrogram Transformer (AST) using Sharpness-Aware Minimization (SAM). Instead of merely minimizing the training loss, our approach optimizes the geometry of the loss surface, guiding the model toward flatter minima that generalize better to unseen patients. We also implement a weighted sampling strategy to handle class imbalance effectively. Our method achieves a state-of-the-art score of 68.10% on the ICBHI 2017 dataset, outperforming existing CNN and hybrid baselines. More importantly, it reaches a sensitivity of 68.31%, a crucial improvement for reliable clinical screening. Further analysis using t-SNE and attention maps confirms that the model learns robust, discriminative features rather than memorizing background noise.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes