CLMay 4, 2020

The Paradigm Discovery Problem

arXiv:2005.01630v1999 citations
AI Analysis

This addresses a core challenge in computational linguistics for language processing, though it appears incremental as it builds on existing resources and methods.

The paper tackles the paradigm discovery problem of learning inflectional morphological systems from unannotated sentences, developing evaluation metrics, datasets, and a benchmark system that uses word embeddings and neural transducers, achieving empirical results on five languages.

This work treats the paradigm discovery problem (PDP), the task of learning an inflectional morphological system from unannotated sentences. We formalize the PDP and develop evaluation metrics for judging systems. Using currently available resources, we construct datasets for the task. We also devise a heuristic benchmark for the PDP and report empirical results on five diverse languages. Our benchmark system first makes use of word embeddings and string similarity to cluster forms by cell and by paradigm. Then, we bootstrap a neural transducer on top of the clustered data to predict words to realize the empty paradigm slots. An error analysis of our system suggests clustering by cell across different inflection classes is the most pressing challenge for future work. Our code and data are available for public use.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes