Jio Gim

SD
h-index2
3papers
4citations
Novelty48%
AI Score31

3 Papers

SDOct 10, 2023
AutoCycle-VC: Towards Bottleneck-Independent Zero-Shot Cross-Lingual Voice Conversion

Haeyun Choi, Jio Gim, Yuho Lee et al.

This paper proposes a simple and robust zero-shot voice conversion system with a cycle structure and mel-spectrogram pre-processing. Previous works suffer from information loss and poor synthesis quality due to their reliance on a carefully designed bottleneck structure. Moreover, models relying solely on self-reconstruction loss struggled with reproducing different speakers' voices. To address these issues, we suggested a cycle-consistency loss that considers conversion back and forth between target and source speakers. Additionally, stacked random-shuffled mel-spectrograms and a label smoothing method are utilized during speaker encoder training to extract a time-independent global speaker representation from speech, which is the key to a zero-shot conversion. Our model outperforms existing state-of-the-art results in both subjective and objective evaluations. Furthermore, it facilitates cross-lingual voice conversions and enhances the quality of synthesized speech.

SDAug 9, 2025
Maestro-EVC: Controllable Emotional Voice Conversion Guided by References and Explicit Prosody

Jinsung Yoon, Wooyeol Jeong, Jio Gim et al.

Emotional voice conversion (EVC) aims to modify the emotional style of speech while preserving its linguistic content. In practical EVC, controllability, the ability to independently control speaker identity and emotional style using distinct references, is crucial. However, existing methods often struggle to fully disentangle these attributes and lack the ability to model fine-grained emotional expressions such as temporal dynamics. We propose Maestro-EVC, a controllable EVC framework that enables independent control of content, speaker identity, and emotion by effectively disentangling each attribute from separate references. We further introduce a temporal emotion representation and an explicit prosody modeling with prosody augmentation to robustly capture and transfer the temporal dynamics of the target emotion, even under prosody-mismatched conditions. Experimental results confirm that Maestro-EVC achieves high-quality, controllable, and emotionally expressive speech synthesis.

LGFeb 24, 2022
Impacts of Individual Fairness on Group Fairness from the Perspective of Generalized Entropy

Youngmi Jin, Jio Gim, Tae-Jin Lee et al.

This paper investigates how the degree of group fairness changes when the degree of individual fairness is actively controlled. As a metric quantifying individual fairness, we consider generalized entropy (GE) recently introduced into machine learning community. To control the degree of individual fairness, we design a classification algorithm satisfying a given degree of individual fairness through an empirical risk minimization (ERM) with a fairness constraint specified in terms of GE. We show the PAC learnability of the fair ERM problem by proving that the true fairness degree does not deviate much from an empirical one with high probability for finite VC dimension if the sample size is big enough. Our experiments show that strengthening individual fairness degree does not always lead to enhancement of group fairness.