CLSDASJul 16, 2024

A Language Modeling Approach to Diacritic-Free Hebrew TTS

arXiv:2407.12206v14 citationsh-index: 33
Originality Synthesis-oriented
AI Analysis

This addresses a specific problem for Hebrew TTS systems by enabling accurate pronunciation without diacritics, though it is incremental as it adapts existing methods to a domain-specific need.

The paper tackled the challenge of text-to-speech (TTS) for modern Hebrew, which lacks diacritics, by proposing a language modeling approach that operates on discrete speech representations and uses weakly supervised data. Results showed it outperformed diacritic-based baselines in content preservation and naturalness.

We tackle the task of text-to-speech (TTS) in Hebrew. Traditional Hebrew contains Diacritics, which dictate the way individuals should pronounce given words, however, modern Hebrew rarely uses them. The lack of diacritics in modern Hebrew results in readers expected to conclude the correct pronunciation and understand which phonemes to use based on the context. This imposes a fundamental challenge on TTS systems to accurately map between text-to-speech. In this work, we propose to adopt a language modeling Diacritics-Free approach, for the task of Hebrew TTS. The model operates on discrete speech representations and is conditioned on a word-piece tokenizer. We optimize the proposed method using in-the-wild weakly supervised data and compare it to several diacritic-based TTS systems. Results suggest the proposed method is superior to the evaluated baselines considering both content preservation and naturalness of the generated speech. Samples can be found under the following link: pages.cs.huji.ac.il/adiyoss-lab/HebTTS/

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes