CLMay 16, 2024

Turkronicles: Diachronic Resources for the Fast Evolving Turkish Language

Togay Yazar, Mucahid Kutlu, İsa Kerem Bayırlı
arXiv:2405.10133v14 citationsh-index: 13Language Resources and Evaluation
Originality Synthesis-oriented
AI Analysis

This provides a quantitative resource for analyzing linguistic changes in Turkish, primarily for linguists and historians, but it is incremental as it builds on existing corpora and methods.

The study tackled the evolution of the Turkish language since 1923 by introducing Turkronicles, a diachronic corpus of 45,375 documents from the Official Gazette, and expanding an existing corpus, revealing that vocabulary divergence increases over time with new words replacing old ones and changes in writing conventions such as decreased circumflex use and letter replacements.

Over the past century, the Turkish language has undergone substantial changes, primarily driven by governmental interventions. In this work, our goal is to investigate the evolution of the Turkish language since the establishment of Türkiye in 1923. Thus, we first introduce Turkronicles which is a diachronic corpus for Turkish derived from the Official Gazette of Türkiye. Turkronicles contains 45,375 documents, detailing governmental actions, making it a pivotal resource for analyzing the linguistic evolution influenced by the state policies. In addition, we expand an existing diachronic Turkish corpus which consists of the records of the Grand National Assembly of Türkiye by covering additional years. Next, combining these two diachronic corpora, we seek answers for two main research questions: How have the Turkish vocabulary and the writing conventions changed since the 1920s? Our analysis reveals that the vocabularies of two different time periods diverge more as the time between them increases, and newly coined Turkish words take the place of their old counterparts. We also observe changes in writing conventions. In particular, the use of circumflex noticeably decreases and words ending with the letters "-b" and "-d" are successively replaced with "-p" and "-t" letters, respectively. Overall, this study quantitatively highlights the dramatic changes in Turkish from various aspects of the language in a diachronic perspective.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes