CLLGMay 10, 2025

Advancing Uto-Aztecan Language Technologies: A Case Study on the Endangered Comanche Language

arXiv:2505.18159v111 citationsh-index: 5Has CodeProceedings of the Fifth Workshop on NLP for Indigenous Languages of the Americas (AmericasNLP)
Originality Synthesis-oriented
AI Analysis

It addresses the problem of endangered language preservation for linguistic researchers and communities, though it is incremental as it applies existing methods to a new language.

This study tackled the digital exclusion of the endangered Comanche language by introducing the first computational investigation, using minimal-cost NLP interventions to support preservation, and found that few-shot prompting with GPT-4o models significantly improved language identification accuracy to near-perfect levels with just five examples.

The digital exclusion of endangered languages remains a critical challenge in NLP, limiting both linguistic research and revitalization efforts. This study introduces the first computational investigation of Comanche, an Uto-Aztecan language on the verge of extinction, demonstrating how minimal-cost, community-informed NLP interventions can support language preservation. We present a manually curated dataset of 412 phrases, a synthetic data generation pipeline, and an empirical evaluation of GPT-4o and GPT-4o-mini for language identification. Our experiments reveal that while LLMs struggle with Comanche in zero-shot settings, few-shot prompting significantly improves performance, achieving near-perfect accuracy with just five examples. Our findings highlight the potential of targeted NLP methodologies in low-resource contexts and emphasize that visibility is the first step toward inclusion. By establishing a foundation for Comanche in NLP, we advocate for computational approaches that prioritize accessibility, cultural sensitivity, and community engagement.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes