ASLGSDMay 7, 2025

Discrete Optimal Transport and Voice Conversion

arXiv:2505.04382v22 citationsh-index: 1
Originality Synthesis-oriented
AI Analysis

This addresses voice conversion for audio processing applications, but it appears incremental as it applies an existing method to a specific domain.

The paper tackled voice conversion by using discrete optimal transport to align audio embeddings between speakers, achieving high quality and effectiveness, and found that applying it as a post-processing step can cause synthetic audio to be misclassified as real.

In this work, we address the voice conversion (VC) task using a vector-based interface. To align audio embeddings between speakers, we employ discrete optimal transport mapping. Our evaluation results demonstrate the high quality and effectiveness of this method. Additionally, we show that applying discrete optimal transport as a post-processing step in audio generation can lead to the incorrect classification of synthetic audio as real.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes