CLNov 24, 2021

For the Purpose of Curry: A UD Treebank for Ashokan Prakrit

arXiv:2111.12783v2648 citations
Originality Synthesis-oriented
AI Analysis

This provides a foundational resource for historical linguistics and computational analysis of Indo-Aryan languages, though it is incremental as it applies existing annotation methods to new data.

The authors created the first linguistically annotated treebank for Ashokan Prakrit, an early Middle Indo-Aryan dialect from 3rd century BCE edicts, using the Universal Dependencies formalism to enable computational study of language change in Indo-Aryan.

We present the first linguistically annotated treebank of Ashokan Prakrit, an early Middle Indo-Aryan dialect continuum attested through Emperor Ashoka Maurya's 3rd century BCE rock and pillar edicts. For annotation, we used the multilingual Universal Dependencies (UD) formalism, following recent UD work on Sanskrit and other Indo-Aryan languages. We touch on some interesting linguistic features that posed issues in annotation: regnal names and other nominal compounds, "proto-ergative" participial constructions, and possible grammaticalizations evidenced by sandhi (phonological assimilation across morpheme boundaries). Eventually, we plan for a complete annotation of all attested Ashokan texts, towards the larger goals of improving UD coverage of different diachronic stages of Indo-Aryan and studying language change in Indo-Aryan using computational methods.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes