CLJul 5, 2025

Handling Korean Out-of-Vocabulary Words with Phoneme Representation Learning

Nayeon Kim, Eojin Jeon, Jun-Hyung Park, SangKeun Lee

arXiv:2507.04018v12.7h-index: 7PAKDD

Originality Incremental advance

AI Analysis

This work addresses a domain-specific problem for Korean NLP by providing a plug-and-play solution to enhance existing embedding models, though it is incremental as it builds on prior methods for OOV handling.

The paper tackles the problem of handling Korean out-of-vocabulary (OOV) words by introducing KOPL, a framework that uses phoneme representation learning to improve Korean NLP tasks, resulting in an average performance gain of 1.9% over the state-of-the-art model.

In this study, we introduce KOPL, a novel framework for handling Korean OOV words with Phoneme representation Learning. Our work is based on the linguistic property of Korean as a phonemic script, the high correlation between phonemes and letters. KOPL incorporates phoneme and word representations for Korean OOV words, facilitating Korean OOV word representations to capture both text and phoneme information of words. We empirically demonstrate that KOPL significantly improves the performance on Korean Natural Language Processing (NLP) tasks, while being readily integrated into existing static and contextual Korean embedding models in a plug-and-play manner. Notably, we show that KOPL outperforms the state-of-the-art model by an average of 1.9%. Our code is available at https://github.com/jej127/KOPL.git.

View on arXiv PDF

Similar