Discrete Optimal Transport and Voice Conversion
This addresses voice conversion for audio processing applications, but it appears incremental as it applies an existing method to a specific domain.
The paper tackled voice conversion by using discrete optimal transport to align audio embeddings between speakers, achieving high quality and effectiveness, and found that applying it as a post-processing step can cause synthetic audio to be misclassified as real.
In this work, we address the voice conversion (VC) task using a vector-based interface. To align audio embeddings between speakers, we employ discrete optimal transport mapping. Our evaluation results demonstrate the high quality and effectiveness of this method. Additionally, we show that applying discrete optimal transport as a post-processing step in audio generation can lead to the incorrect classification of synthetic audio as real.