CLSep 24, 2025

OLaPh: Optimal Language Phonemizer

arXiv:2509.20086v11 citationsh-index: 1
Originality Incremental advance
AI Analysis

This addresses phonemization accuracy for text-to-speech systems, particularly for out-of-vocabulary words, though it appears incremental as it builds on existing methods.

The paper tackles the problem of phonemization (converting text to phonemes) for text-to-speech, particularly for challenging cases like names and loanwords, by introducing OLaPh, a framework combining lexica, NLP techniques, and probabilistic scoring, which shows improved accuracy in German and English evaluations, with further gains from an LLM trained on its data.

Phonemization, the conversion of text into phonemes, is a key step in text-to-speech. Traditional approaches use rule-based transformations and lexicon lookups, while more advanced methods apply preprocessing techniques or neural networks for improved accuracy on out-of-domain vocabulary. However, all systems struggle with names, loanwords, abbreviations, and homographs. This work presents OLaPh (Optimal Language Phonemizer), a framework that combines large lexica, multiple NLP techniques, and compound resolution with a probabilistic scoring function. Evaluations in German and English show improved accuracy over previous approaches, including on a challenging dataset. To further address unresolved cases, we train a large language model on OLaPh-generated data, which achieves even stronger generalization and performance. Together, the framework and LLM improve phonemization consistency and provide a freely available resource for future research.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes