CLSep 22, 2024

Can a Neural Model Guide Fieldwork? A Case Study on Morphological Data Collection

Aso Mahmudi, Borja Herce, Demian Inostroza Amestica, Andreas Scherbakov, Eduard Hovy, Ekaterina Vylomova

arXiv:2409.14628v29.614 citationsh-index: 95Has Code

Originality Incremental advance

AI Analysis

This work addresses the time-consuming nature of linguistic fieldwork for language documentation, offering incremental improvements in data collection efficiency.

The paper tackles the problem of inefficient linguistic fieldwork for morphological data collection by introducing a model that guides linguists with sampling strategies and neural model predictions, resulting in improved efficiency through uniform sampling and model confidence-based interactions.

Linguistic fieldwork is an important component in language documentation and preservation. However, it is a long, exhaustive, and time-consuming process. This paper presents a novel model that guides a linguist during the fieldwork and accounts for the dynamics of linguist-speaker interactions. We introduce a novel framework that evaluates the efficiency of various sampling strategies for obtaining morphological data and assesses the effectiveness of state-of-the-art neural models in generalising morphological structures. Our experiments highlight two key strategies for improving the efficiency: (1) increasing the diversity of annotated data by uniform sampling among the cells of the paradigm tables, and (2) using model confidence as a guide to enhance positive interaction by providing reliable predictions during annotation.

View on arXiv PDF Code

Similar