CL LGAug 29, 2023

Taxonomic Loss for Morphological Glossing of Low-Resource Languages

arXiv:2308.15055v10.91 citationsh-index: 21Has Code

Originality Incremental advance

AI Analysis

This work addresses the challenge of automated language documentation for low-resource languages, though it is incremental as it offers a specific improvement in human-in-the-loop settings rather than broad SOTA gains.

The paper tackled the problem of morphological glossing for low-resource languages by proposing a taxonomic loss function that leverages morphological information, finding it improves top-n prediction accuracy but not single-label accuracy compared to standard methods.

Morpheme glossing is a critical task in automated language documentation and can benefit other downstream applications greatly. While state-of-the-art glossing systems perform very well for languages with large amounts of existing data, it is more difficult to create useful models for low-resource languages. In this paper, we propose the use of a taxonomic loss function that exploits morphological information to make morphological glossing more performant when data is scarce. We find that while the use of this loss function does not outperform a standard loss function with regards to single-label prediction accuracy, it produces better predictions when considering the top-n predicted labels. We suggest this property makes the taxonomic loss function useful in a human-in-the-loop annotation setting.

View on arXiv PDF Code

Similar