CLSep 3, 2025

LatPhon: Lightweight Multilingual G2P for Romance Languages and English

arXiv:2509.03300v11 citationsh-index: 1
Originality Incremental advance
AI Analysis

This provides a compact, multilingual solution for speech processing tasks like TTS and ASR across Romance languages and English, though it is incremental as it builds on existing Transformer and G2P approaches.

The paper tackled grapheme-to-phoneme conversion for multiple Latin-script languages by developing LatPhon, a lightweight Transformer model with 7.5M parameters, achieving a mean phoneme error rate of 3.5% on the ipa-dict corpus, outperforming a baseline and approaching language-specific methods while using only 30 MB of memory.

Grapheme-to-phoneme (G2P) conversion is a key front-end for text-to-speech (TTS), automatic speech recognition (ASR), speech-to-speech translation (S2ST) and alignment systems, especially across multiple Latin-script languages.We present LatPhon, a 7.5 M - parameter Transformer jointly trained on six such languages--English, Spanish, French, Italian, Portuguese, and Romanian. On the public ipa-dict corpus, it attains a mean phoneme error rate (PER) of 3.5%, outperforming the byte-level ByT5 baseline (5.4%) and approaching language-specific WFSTs (3.2%) while occupying 30 MB of memory, which makes on-device deployment feasible when needed. These results indicate that compact multilingual G2P can serve as a universal front-end for Latin-language speech pipelines.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes