SDJun 1

C2GA: A Class-Controllable Generative Augmentation Framework for Respiratory Sound Classification

arXiv:2606.0221246.2
Predicted impact top 62% in SD · last 90 daysOriginality Incremental advance
AI Analysis

This work addresses data scarcity and class imbalance in respiratory sound classification, a critical problem for clinical diagnosis, but the improvement is incremental over existing generative methods.

C2GA proposes a class-controllable generative augmentation framework for respiratory sound classification, using a conditional VQ-VAE and Transformer-based autoregressive prior to generate high-fidelity Mel-spectrograms. It improves classification performance on imbalanced and noisy datasets, achieving state-of-the-art results on the ICBHI 2017 benchmark.

Background: Respiratory sound classification plays a critical role in the clinical identification of pulmonary pathologies. However, its performance is often hindered by the limited size, severe noise, and class imbalance of real-world auscultation datasets. Although conventional audio augmentation techniques are easy to implement, they may inadvertently distort subtle pathological characteristics. Meanwhile, existing Variational Autoencoder (VAE)- or Generative Adversarial Network (GAN)-based generative approaches often suffer from limited sample fidelity and insufficient controllability over class semantics, particularly under conditions of scarce supervision. Methods: To overcome these limitations, we propose C2GA, a class-controllable generative augmentation framework. C2GA first constructs a semantically rich discrete latent space using a conditional Vector-Quantized Variational Autoencoder (VQ-VAE), in which local acoustic tokens are explicitly decoupled from global class prototypes. Subsequently, a Transformer-based autoregressive prior is trained to generate label-consistent token sequences. These generated tokens are then fused with the corresponding class prototypes and decoded into high-fidelity Mel-spectrograms for data augmentation. Conclusion: These results indicate that C2GA provides an effective and semantically reliable augmentation strategy for respiratory sound analysis. By enabling controllable and high-quality data generation, the proposed framework offers a promising solution for improving the robustness and generalization of respiratory sound classification in realistic clinical scenarios.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes