ASAISDSPJun 10, 2024

JenGAN: Stacked Shifted Filters in GAN-Based Speech Synthesis

arXiv:2406.06111v11 citations
Originality Incremental advance
AI Analysis

This addresses artifact reduction in speech synthesis vocoders, offering an incremental improvement for audio quality in applications like text-to-speech.

The paper tackled audible artifacts like tonal artifacts in non-autoregressive GAN-based neural vocoders by proposing JenGAN, a training strategy using stacked shifted filters to ensure shift-equivariance, which improved performance with significantly superior scores across most evaluation metrics.

Non-autoregressive GAN-based neural vocoders are widely used due to their fast inference speed and high perceptual quality. However, they often suffer from audible artifacts such as tonal artifacts in their generated results. Therefore, we propose JenGAN, a new training strategy that involves stacking shifted low-pass filters to ensure the shift-equivariant property. This method helps prevent aliasing and reduce artifacts while preserving the model structure used during inference. In our experimental evaluation, JenGAN consistently enhances the performance of vocoder models, yielding significantly superior scores across the majority of evaluation metrics.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes