LGSDASMLAug 12, 2020

Improving Stability of LS-GANs for Audio and Speech Signals

arXiv:2008.05454v1
Originality Incremental advance
AI Analysis

This work addresses training instability for researchers and practitioners using GANs in audio and speech domains, though it appears incremental as it builds on existing LS-GAN frameworks.

The paper tackles the instability issue in GAN training for audio and speech signals by introducing a new similarity metric in unitary space of Schur decomposition, resulting in improved spectrogram quality with higher Fréchet inception distance scores and better signal-to-noise ratios compared to baseline LS-GANs.

In this paper we address the instability issue of generative adversarial network (GAN) by proposing a new similarity metric in unitary space of Schur decomposition for 2D representations of audio and speech signals. We show that encoding departure from normality computed in this vector space into the generator optimization formulation helps to craft more comprehensive spectrograms. We demonstrate the effectiveness of binding this metric for enhancing stability in training with less mode collapse compared to baseline GANs. Experimental results on subsets of UrbanSound8k and Mozilla common voice datasets have shown considerable improvements on the quality of the generated samples measured by the Fréchet inception distance. Moreover, reconstructed signals from these samples, have achieved higher signal to noise ratio compared to regular LS-GANs.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes