SDAIASJun 22, 2023

MFCCGAN: A Novel MFCC-Based Speech Synthesizer Using Adversarial Learning

arXiv:2306.12785v14 citationsh-index: 13Has Code
Originality Incremental advance
AI Analysis

This work addresses speech synthesis for applications requiring high-quality audio, though it is incremental as it builds on existing GAN and vocoder techniques.

The paper tackles speech synthesis by introducing MFCCGAN, a novel synthesizer using adversarial learning with MFCC inputs to generate raw speech waveforms, resulting in improvements of up to 53% in intelligibility and 78% in quality over baseline methods.

In this paper, we introduce MFCCGAN as a novel speech synthesizer based on adversarial learning that adopts MFCCs as input and generates raw speech waveforms. Benefiting the GAN model capabilities, it produces speech with higher intelligibility than a rule-based MFCC-based speech synthesizer WORLD. We evaluated the model based on a popular intrusive objective speech intelligibility measure (STOI) and quality (NISQA score). Experimental results show that our proposed system outperforms Librosa MFCC- inversion (by an increase of about 26% up to 53% in STOI and 16% up to 78% in NISQA score) and a rise of about 10% in intelligibility and about 4% in naturalness in comparison with conventional rule-based vocoder WORLD that used in the CycleGAN-VC family. However, WORLD needs additional data like F0. Finally, using perceptual loss in discriminators based on STOI could improve the quality more. WebMUSHRA-based subjective tests also show the quality of the proposed approach.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes