SDAIASSep 21, 2023

FluentEditor: Text-based Speech Editing by Considering Acoustic and Prosody Consistency

arXiv:2309.11725v28 citationsh-index: 15Has Code
Originality Incremental advance
AI Analysis

This work improves speech editing for users by enhancing fluency, though it is incremental as it builds on existing neural network-based techniques.

The paper tackles the problem of text-based speech editing by addressing local and global fluency issues, proposing FluentEditor with acoustic and prosody consistency constraints, which outperforms advanced baselines in naturalness and fluency on the VCTK dataset.

Text-based speech editing (TSE) techniques are designed to enable users to edit the output audio by modifying the input text transcript instead of the audio itself. Despite much progress in neural network-based TSE techniques, the current techniques have focused on reducing the difference between the generated speech segment and the reference target in the editing region, ignoring its local and global fluency in the context and original utterance. To maintain the speech fluency, we propose a fluency speech editing model, termed \textit{FluentEditor}, by considering fluency-aware training criterion in the TSE training. Specifically, the \textit{acoustic consistency constraint} aims to smooth the transition between the edited region and its neighboring acoustic segments consistent with the ground truth, while the \textit{prosody consistency constraint} seeks to ensure that the prosody attributes within the edited regions remain consistent with the overall style of the original utterance. The subjective and objective experimental results on VCTK demonstrate that our \textit{FluentEditor} outperforms all advanced baselines in terms of naturalness and fluency. The audio samples and code are available at \url{https://github.com/Ai-S2-Lab/FluentEditor}.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes