ASLGJun 4, 2021

Fre-GAN: Adversarial Frequency-consistent Audio Synthesis

arXiv:2106.02297v261 citations
AI Analysis

This work addresses audio quality issues in speech synthesis for applications like text-to-speech, but it is incremental as it builds on existing GAN-based vocoders.

The paper tackles the problem of spectral artifacts in neural vocoders by proposing Fre-GAN, which achieves frequency-consistent audio synthesis and reduces the gap to ground-truth audio to only 0.03 MOS.

Although recent works on neural vocoder have improved the quality of synthesized audio, there still exists a gap between generated and ground-truth audio in frequency space. This difference leads to spectral artifacts such as hissing noise or reverberation, and thus degrades the sample quality. In this paper, we propose Fre-GAN which achieves frequency-consistent audio synthesis with highly improved generation quality. Specifically, we first present resolution-connected generator and resolution-wise discriminators, which help learn various scales of spectral distributions over multiple frequency bands. Additionally, to reproduce high-frequency components accurately, we leverage discrete wavelet transform in the discriminators. From our experiments, Fre-GAN achieves high-fidelity waveform generation with a gap of only 0.03 MOS compared to ground-truth audio while outperforming standard models in quality.

Code Implementations2 repos
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes