16.5CLMay 13
Merging Methods for Multilingual Knowledge Editing for Large Language Models: An Empirical OdysseyKunil Lee, Ki-Young Shin, Jong-Hyeok Lee et al.
Multilingual knowledge editing (MKE) remains challenging because language-specific edits interfere with one another, even when locate-then-edit methods work well in monolingual settings. This paper focuses on three issues: the effectiveness of vector merging methods for MKE, the extent to which Task Singular Vectors for Merging (TSVM) can reduce multilingual interference, and the influence of the weight scaling factor and rank compression ratio on performance. We evaluate six merging variants with two popular backbone large language models, two base knowledge editing methods, and 12 languages on the MzsRE benchmark under a large-scale batch-editing setting. Our results show that vector summation with shared covariance is the most reliable overall strategy, whereas simple summation without shared covariance performs poorly. TSVM improves performance in some settings, but its ability to mitigate multilingual interference is limited. We also find that performance is sensitive to both weight scale and rank ratio, with larger-than-default scaling and relatively low rank often yielding better results. These findings clarify the practical strengths and limits of current vector merging methods for MKE and provide guidance for future multilingual knowledge editing research.
SDOct 10, 2023
AutoCycle-VC: Towards Bottleneck-Independent Zero-Shot Cross-Lingual Voice ConversionHaeyun Choi, Jio Gim, Yuho Lee et al.
This paper proposes a simple and robust zero-shot voice conversion system with a cycle structure and mel-spectrogram pre-processing. Previous works suffer from information loss and poor synthesis quality due to their reliance on a carefully designed bottleneck structure. Moreover, models relying solely on self-reconstruction loss struggled with reproducing different speakers' voices. To address these issues, we suggested a cycle-consistency loss that considers conversion back and forth between target and source speakers. Additionally, stacked random-shuffled mel-spectrograms and a label smoothing method are utilized during speaker encoder training to extract a time-independent global speaker representation from speech, which is the key to a zero-shot conversion. Our model outperforms existing state-of-the-art results in both subjective and objective evaluations. Furthermore, it facilitates cross-lingual voice conversions and enhances the quality of synthesized speech.
SDNov 25, 2024
QR-VC: Leveraging Quantization Residuals for Linear Disentanglement in Zero-Shot Voice ConversionYoungjun Sim, Jinsung Yoon, Wooyeol Jeong et al.
Zero-shot voice conversion is a technique that alters the speaker identity of an input speech to match a target speaker using only a single reference utterance, without requiring additional training. Recent approaches extensively utilize self-supervised learning features with K-means quantization to extract high-quality content representations while removing speaker identity. However, this quantization process also eliminates fine-grained phonetic and prosodic variations, degrading intelligibility and prosody preservation. While prior works have primarily focused on quantized representations, quantization residuals remain underutilized and deserve further exploration. In this paper, we introduce a novel approach that fully utilizes quantization residuals by leveraging temporal properties of speech components. This facilitates the disentanglement of speaker identity and the recovery of phonetic and prosodic details lost during quantization. By applying only K-means quantization and linear projections, our method achieves simple yet effective disentanglement, without requiring complex architectures or explicit supervision. This allows for high-fidelity voice conversion trained solely with reconstruction losses. Experiments show that the proposed model outperforms existing methods across both subjective and objective metrics. It achieves superior intelligibility and speaker similarity, along with improved prosody preservation, highlighting the impact of our Linear Disentangler module.
SDAug 9, 2025
Maestro-EVC: Controllable Emotional Voice Conversion Guided by References and Explicit ProsodyJinsung Yoon, Wooyeol Jeong, Jio Gim et al.
Emotional voice conversion (EVC) aims to modify the emotional style of speech while preserving its linguistic content. In practical EVC, controllability, the ability to independently control speaker identity and emotional style using distinct references, is crucial. However, existing methods often struggle to fully disentangle these attributes and lack the ability to model fine-grained emotional expressions such as temporal dynamics. We propose Maestro-EVC, a controllable EVC framework that enables independent control of content, speaker identity, and emotion by effectively disentangling each attribute from separate references. We further introduce a temporal emotion representation and an explicit prosody modeling with prosody augmentation to robustly capture and transfer the temporal dynamics of the target emotion, even under prosody-mismatched conditions. Experimental results confirm that Maestro-EVC achieves high-quality, controllable, and emotionally expressive speech synthesis.
LGFeb 24, 2022
Impacts of Individual Fairness on Group Fairness from the Perspective of Generalized EntropyYoungmi Jin, Jio Gim, Tae-Jin Lee et al.
This paper investigates how the degree of group fairness changes when the degree of individual fairness is actively controlled. As a metric quantifying individual fairness, we consider generalized entropy (GE) recently introduced into machine learning community. To control the degree of individual fairness, we design a classification algorithm satisfying a given degree of individual fairness through an empirical risk minimization (ERM) with a fairness constraint specified in terms of GE. We show the PAC learnability of the fair ERM problem by proving that the true fairness degree does not deviate much from an empirical one with high probability for finite VC dimension if the sample size is big enough. Our experiments show that strengthening individual fairness degree does not always lead to enhancement of group fairness.
NINov 7, 2017
Pre-shared Key Agreement for Secure Public Wi-FiSeokseong Jeon, Chansu Yu, Young-Joo Suh
This paper presents a novel pre-shared key (PSK) agreement scheme to establish a secure connection between a Wi-Fi client and access point (AP) without prior knowledge of a password. The standard IEEE 802.11 security method, Robust Security Network Association, widely known as Wi-Fi Protected Access (WPA) and WPA2, derives a shared cryptographic key if and only if a user provides an identical password which an AP possesses, causing ofinconvenience of obtaining and entering the password. In this paper, a proposed scheme, Secure Open AP (SOAP), adopts two public key algorithms, the elliptic curve Diffie-Hellman key exchange algorithm (ECDH) and digital signature algorithm (ECDSA) to establish a secure connection between a client and an AP without having prior knowledge of a password. Implementation and experiment results demonstrate the viability of the proposed scheme.