CL AIDec 12, 2024

PolyIPA -- Multilingual Phoneme-to-Grapheme Conversion Model

arXiv:2412.09102v13.42 citationsh-index: 1Natural Language Processing, Information Retrieval and AI Trends 2025

Originality Incremental advance

AI Analysis

This addresses the problem of cross-linguistic name transliteration and onomastic research for multilingual applications, representing an incremental improvement with specific gains.

The paper tackles multilingual phoneme-to-grapheme conversion for tasks like transliteration and information retrieval, achieving a mean Character Error Rate of 0.055 and a character-level BLEU score of 0.914, with beam search reducing the error rate by 52.7% to 0.026 for top-3 candidates.

This paper presents PolyIPA, a novel multilingual phoneme-to-grapheme conversion model designed for multilingual name transliteration, onomastic research, and information retrieval. The model leverages two helper models developed for data augmentation: IPA2vec for finding soundalikes across languages, and similarIPA for handling phonetic notation variations. Evaluated on a test set that spans multiple languages and writing systems, the model achieves a mean Character Error Rate of 0.055 and a character-level BLEU score of 0.914, with particularly strong performance on languages with shallow orthographies. The implementation of beam search further improves practical utility, with top-3 candidates reducing the effective error rate by 52.7\% (to CER: 0.026), demonstrating the model's effectiveness for cross-linguistic applications.

View on arXiv PDF

Similar