CLSep 12, 2018

Multimodal neural pronunciation modeling for spoken languages with logographic origin

arXiv:1809.04203v11089 citations
Originality Incremental advance
AI Analysis

This work addresses a specific challenge in pronunciation modeling for languages with logographic origins, such as Cantonese, offering incremental improvements for computational linguistics and speech technology applications.

The paper tackles the problem of predicting pronunciation for Cantonese logographic characters, which lack a standard written form and explicit phonetic encoding, by proposing a multimodal neural approach that uses geometric representations and cognate pronunciations. The method improves performance by 18.1% over unimodal and 25.0% over multimodal baselines.

Graphemes of most languages encode pronunciation, though some are more explicit than others. Languages like Spanish have a straightforward mapping between its graphemes and phonemes, while this mapping is more convoluted for languages like English. Spoken languages such as Cantonese present even more challenges in pronunciation modeling: (1) they do not have a standard written form, (2) the closest graphemic origins are logographic Han characters, of which only a subset of these logographic characters implicitly encodes pronunciation. In this work, we propose a multimodal approach to predict the pronunciation of Cantonese logographic characters, using neural networks with a geometric representation of logographs and pronunciation of cognates in historically related languages. The proposed framework improves performance by 18.1% and 25.0% respective to unimodal and multimodal baselines.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes