CLSep 22, 2024

Can a Neural Model Guide Fieldwork? A Case Study on Morphological Data Collection

arXiv:2409.14628v214 citationsh-index: 3
Originality Incremental advance
AI Analysis

This work addresses the time-consuming nature of linguistic fieldwork for language documentation, offering incremental improvements in data collection efficiency.

The paper tackles the problem of inefficient linguistic fieldwork for morphological data collection by introducing a model that guides linguists with sampling strategies and neural model predictions, resulting in improved efficiency through uniform sampling and model confidence-based interactions.

Linguistic fieldwork is an important component in language documentation and preservation. However, it is a long, exhaustive, and time-consuming process. This paper presents a novel model that guides a linguist during the fieldwork and accounts for the dynamics of linguist-speaker interactions. We introduce a novel framework that evaluates the efficiency of various sampling strategies for obtaining morphological data and assesses the effectiveness of state-of-the-art neural models in generalising morphological structures. Our experiments highlight two key strategies for improving the efficiency: (1) increasing the diversity of annotated data by uniform sampling among the cells of the paradigm tables, and (2) using model confidence as a guide to enhance positive interaction by providing reliable predictions during annotation.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes