CLJan 25, 2024

Detecting Structured Language Alternations in Historical Documents by Combining Language Identification with Fourier Analysis

arXiv:2401.14569v1103 citationsLATECHCLFL
Originality Synthesis-oriented
AI Analysis

This addresses the challenge of language identification in historical documents for linguists and archivists, but it is incremental as it applies existing methods to a specific domain.

The authors tackled the problem of identifying documents in Armeno-Turkish, a historic language with a nonstandard script combination, by developing a workflow to detect structured language alternations based on frequency patterns. They introduced a new task for analyzing multilinguality in historical texts.

In this study, we present a generalizable workflow to identify documents in a historic language with a nonstandard language and script combination, Armeno-Turkish. We introduce the task of detecting distinct patterns of multilinguality based on the frequency of structured language alternations within a document.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes