CLOct 9, 2020

Learning to Pronounce Chinese Without a Pronunciation Dictionary

arXiv:2010.04744v1993 citations
Originality Highly original
AI Analysis

This addresses the challenge of text-to-speech conversion for Chinese language processing, particularly in scenarios where pronunciation dictionaries are unavailable, representing a substantial improvement over existing methods.

The paper tackles the problem of learning to pronounce Chinese text in Mandarin without a pronunciation dictionary by establishing a many-to-many mapping between characters and pronunciations from non-parallel streams, achieving a token-level character-to-syllable accuracy of 89%, which significantly exceeds the 22% accuracy of prior work.

We demonstrate a program that learns to pronounce Chinese text in Mandarin, without a pronunciation dictionary. From non-parallel streams of Chinese characters and Chinese pinyin syllables, it establishes a many-to-many mapping between characters and pronunciations. Using unsupervised methods, the program effectively deciphers writing into speech. Its token-level character-to-syllable accuracy is 89%, which significantly exceeds the 22% accuracy of prior work.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes