CLMay 21, 2020

Evaluating Neural Morphological Taggers for Sanskrit

Ashim Gupta, Amrith Krishna, Pawan Goyal, Oliver Hellwig

arXiv:2005.10893v131.11000 citationsHas Code

Originality Synthesis-oriented

AI Analysis

This work addresses morphological tagging for Sanskrit, an under-resourced language, but is incremental as it applies existing methods to a new dataset.

The paper evaluated four neural sequence labeling models for morphological tagging in Sanskrit, a morphologically rich language with over 40,000 possible labels, and found that syncretism was a common cause of errors across all models.

Neural sequence labelling approaches have achieved state of the art results in morphological tagging. We evaluate the efficacy of four standard sequence labelling models on Sanskrit, a morphologically rich, fusional Indian language. As its label space can theoretically contain more than 40,000 labels, systems that explicitly model the internal structure of a label are more suited for the task, because of their ability to generalise to labels not seen during training. We find that although some neural models perform better than others, one of the common causes for error for all of these models is mispredictions due to syncretism.

View on arXiv PDF Code

Similar