Unified Microphone Conversion: Many-to-Many Device Mapping via Feature-wise Linear Modulation
This addresses the problem of scalability in microphone conversion for sound event classification systems, though it is incremental over prior CycleGAN-based methods.
The paper tackles device variability in sound event classification by proposing a unified generative framework that enables many-to-many device mappings, outperforming state-of-the-art methods by 2.6% and reducing variability by 0.8% in macro-average F1 score.
We present Unified Microphone Conversion, a unified generative framework designed to bolster sound event classification (SEC) systems against device variability. While our prior CycleGAN-based methods effectively simulate device characteristics, they require separate models for each device pair, limiting scalability. Our approach overcomes this constraint by conditioning the generator on frequency response data, enabling many-to-many device mappings through unpaired training. We integrate frequency-response information via Feature-wise Linear Modulation, further enhancing scalability. Additionally, incorporating synthetic frequency response differences improves the applicability of our framework for real-world application. Experimental results show that our method outperforms the state-of-the-art by 2.6% and reduces variability by 0.8% in macro-average F1 score.