CLOct 16, 2020

SIGTYP 2020 Shared Task: Prediction of Typological Features

arXiv:2010.08246v2999 citations
Originality Synthesis-oriented
AI Analysis

This work addresses the issue of incomplete typological data for linguists and NLP researchers, but it is incremental as it builds on existing prediction methods.

The paper tackled the problem of sparsely populated typological knowledge bases by organizing a shared task to predict missing linguistic features, which attracted 8 submissions and found that the most successful methods used feature correlations but still struggled with languages having few known features.

Typological knowledge bases (KBs) such as WALS (Dryer and Haspelmath, 2013) contain information about linguistic properties of the world's languages. They have been shown to be useful for downstream applications, including cross-lingual transfer learning and linguistic probing. A major drawback hampering broader adoption of typological KBs is that they are sparsely populated, in the sense that most languages only have annotations for some features, and skewed, in that few features have wide coverage. As typological features often correlate with one another, it is possible to predict them and thus automatically populate typological KBs, which is also the focus of this shared task. Overall, the task attracted 8 submissions from 5 teams, out of which the most successful methods make use of such feature correlations. However, our error analysis reveals that even the strongest submitted systems struggle with predicting feature values for languages where few features are known.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes