CLNov 24, 2025

A symbolic Perl algorithm for the unification of Nahuatl word spellings

Juan-José Guzmán-Landa, Jesús Vázquez-Osorio, Juan-Manuel Torres-Moreno, Ligia Quintana Torres, Miguel Figueroa-Saavedra, Martha-Lorena Avendaño-Garrido, Graham Ranger, Patricia Velázquez-Morales, Gerardo Eugenio Sierra Martínez

arXiv:2511.19118v1

Originality Synthesis-oriented

AI Analysis

This work addresses a domain-specific problem for researchers and practitioners working with Nahuatl language texts, but it is incremental as it builds on existing algorithms and corpora.

The paper tackles the problem of automatically unifying orthographic variations in Nahuatl text documents using a symbolic algorithm based on linguistic rules and regular expressions, achieving encouraging results in a manual evaluation of the unified sentences' semantic quality.

In this paper, we describe a symbolic model for the automatic orthographic unification of Nawatl text documents. Our model is based on algorithms that we have previously used to analyze sentences in Nawatl, and on the corpus called $π$-yalli, consisting of texts in several Nawatl orthographies. Our automatic unification algorithm implements linguistic rules in symbolic regular expressions. We also present a manual evaluation protocol that we have proposed and implemented to assess the quality of the unified sentences generated by our algorithm, by testing in a sentence semantic task. We have obtained encouraging results from the evaluators for most of the desired features of our artificially unified sentences

View on arXiv PDF

Similar