CLSDASJun 14, 2025

Phonikud: Hebrew Grapheme-to-Phoneme Conversion for Real-Time Text-to-Speech

arXiv:2506.12311v21 citationsHas Code
Originality Incremental advance
AI Analysis

This addresses the problem of real-time text-to-speech for Modern Hebrew, which is incremental as it adapts existing methods with lightweight components.

The authors tackled the challenge of real-time text-to-speech for Modern Hebrew by developing Phonikud, a lightweight grapheme-to-phoneme system that outputs fully-specified IPA transcriptions, resulting in more accurate phoneme prediction and enabling effective real-time TTS models with superior speed-accuracy trade-offs.

Real-time text-to-speech (TTS) for Modern Hebrew is challenging due to the language's orthographic complexity. Existing solutions ignore crucial phonetic features such as stress that remain underspecified even when vowel marks are added. To address these limitations, we introduce Phonikud, a lightweight, open-source Hebrew grapheme-to-phoneme (G2P) system that outputs fully-specified IPA transcriptions. Our approach adapts an existing diacritization model with lightweight adaptors, incurring negligible additional latency. We also contribute the ILSpeech dataset of transcribed Hebrew speech with IPA annotations, serving as a benchmark for Hebrew G2P, as training data for TTS systems, and enabling audio-to-IPA for evaluating TTS performance while capturing important phonetic details. Our results demonstrate that Phonikud G2P conversion more accurately predicts phonemes from Hebrew text compared to prior methods, and that this enables training of effective real-time Hebrew TTS models with superior speed-accuracy trade-offs. We release our code, data, and models at https: //phonikud.github.io.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes