CycleFlow: Purify Information Factors by Cycle Loss
This work addresses a specific bottleneck in speech processing for voice conversion and editing, representing an incremental improvement over existing methods.
The paper tackles the problem of poor factor disentanglement in SpeechFlow, an information bottleneck-based factorization model, by proposing CycleFlow, which uses random factor substitution and cycle loss to reduce mutual information among factors, resulting in clearly better voice conversion performance.
SpeechFlow is a powerful factorization model based on information bottleneck (IB), and its effectiveness has been reported by several studies. A potential problem of SpeechFlow, however, is that if the IB channels are not well designed, the resultant factors cannot be well disentangled. In this study, we propose a CycleFlow model that combines random factor substitution and cycle loss to solve this problem. Experiments on voice conversion tasks demonstrate that this simple technique can effectively reduce mutual information among individual factors, and produce clearly better conversion than the IB-based SpeechFlow. CycleFlow can also be used as a powerful tool for speech editing. We demonstrate this usage by an emotion perception experiment.