Can a Neural Model Guide Fieldwork? A Case Study on Morphological Data Collection
This work addresses the time-consuming nature of linguistic fieldwork for language documentation, offering incremental improvements in data collection efficiency.
The paper tackles the problem of inefficient linguistic fieldwork for morphological data collection by introducing a model that guides linguists with sampling strategies and neural model predictions, resulting in improved efficiency through uniform sampling and model confidence-based interactions.
Linguistic fieldwork is an important component in language documentation and preservation. However, it is a long, exhaustive, and time-consuming process. This paper presents a novel model that guides a linguist during the fieldwork and accounts for the dynamics of linguist-speaker interactions. We introduce a novel framework that evaluates the efficiency of various sampling strategies for obtaining morphological data and assesses the effectiveness of state-of-the-art neural models in generalising morphological structures. Our experiments highlight two key strategies for improving the efficiency: (1) increasing the diversity of annotated data by uniform sampling among the cells of the paradigm tables, and (2) using model confidence as a guide to enhance positive interaction by providing reliable predictions during annotation.