SDJun 1

C2GA: A Class-Controllable Generative Augmentation Framework for Respiratory Sound Classification

Ziqi Ma, Mengyu Han, Anteng Cai, Zhanchong Liu, Bowen Feng, Hang Yu, Sheng Hu

arXiv:2606.0221246.2

Predicted impact top 62% in SD · last 90 daysOriginality Incremental advance

AI Analysis

This work addresses data scarcity and class imbalance in respiratory sound classification, a critical problem for clinical diagnosis, but the improvement is incremental over existing generative methods.

C2GA proposes a class-controllable generative augmentation framework for respiratory sound classification, using a conditional VQ-VAE and Transformer-based autoregressive prior to generate high-fidelity Mel-spectrograms. It improves classification performance on imbalanced and noisy datasets, achieving state-of-the-art results on the ICBHI 2017 benchmark.

Background: Respiratory sound classification plays a critical role in the clinical identification of pulmonary pathologies. However, its performance is often hindered by the limited size, severe noise, and class imbalance of real-world auscultation datasets. Although conventional audio augmentation techniques are easy to implement, they may inadvertently distort subtle pathological characteristics. Meanwhile, existing Variational Autoencoder (VAE)- or Generative Adversarial Network (GAN)-based generative approaches often suffer from limited sample fidelity and insufficient controllability over class semantics, particularly under conditions of scarce supervision. Methods: To overcome these limitations, we propose C2GA, a class-controllable generative augmentation framework. C2GA first constructs a semantically rich discrete latent space using a conditional Vector-Quantized Variational Autoencoder (VQ-VAE), in which local acoustic tokens are explicitly decoupled from global class prototypes. Subsequently, a Transformer-based autoregressive prior is trained to generate label-consistent token sequences. These generated tokens are then fused with the corresponding class prototypes and decoded into high-fidelity Mel-spectrograms for data augmentation. Conclusion: These results indicate that C2GA provides an effective and semantically reliable augmentation strategy for respiratory sound analysis. By enabling controllable and high-quality data generation, the proposed framework offers a promising solution for improving the robustness and generalization of respiratory sound classification in realistic clinical scenarios.

View on arXiv PDF

Similar