CLOct 21, 2020

Deciphering Undersegmented Ancient Scripts Using Phonetic Prior

arXiv:2010.11054v1805 citations
Originality Incremental advance
AI Analysis

This work addresses decipherment challenges for historical linguists and archaeologists dealing with lost languages, but it is incremental as it builds on existing linguistic constraints.

The paper tackled the problem of deciphering ancient scripts that are not fully segmented and lack known related languages, by proposing a model that uses phonetic priors from the International Phonetic Alphabet to jointly handle word segmentation and cognate alignment. The results showed clear gains in deciphered languages like Gothic and Ugaritic, and for the undeciphered Iberian, it did not strongly support Basque as a related language, aligning with current scholarship.

Most undeciphered lost languages exhibit two characteristics that pose significant decipherment challenges: (1) the scripts are not fully segmented into words; (2) the closest known language is not determined. We propose a decipherment model that handles both of these challenges by building on rich linguistic constraints reflecting consistent patterns in historical sound change. We capture the natural phonological geometry by learning character embeddings based on the International Phonetic Alphabet (IPA). The resulting generative framework jointly models word segmentation and cognate alignment, informed by phonological constraints. We evaluate the model on both deciphered languages (Gothic, Ugaritic) and an undeciphered one (Iberian). The experiments show that incorporating phonetic geometry leads to clear and consistent gains. Additionally, we propose a measure for language closeness which correctly identifies related languages for Gothic and Ugaritic. For Iberian, the method does not show strong evidence supporting Basque as a related language, concurring with the favored position by the current scholarship.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes