Detecting Structured Language Alternations in Historical Documents by Combining Language Identification with Fourier Analysis
This addresses the challenge of language identification in historical documents for linguists and archivists, but it is incremental as it applies existing methods to a specific domain.
The authors tackled the problem of identifying documents in Armeno-Turkish, a historic language with a nonstandard script combination, by developing a workflow to detect structured language alternations based on frequency patterns. They introduced a new task for analyzing multilinguality in historical texts.
In this study, we present a generalizable workflow to identify documents in a historic language with a nonstandard language and script combination, Armeno-Turkish. We introduce the task of detecting distinct patterns of multilinguality based on the frequency of structured language alternations within a document.