CLJun 10, 2018

Unsupervised Disambiguation of Syncretism in Inflected Lexicons

arXiv:1806.03740v21093 citations
Originality Incremental advance
AI Analysis

This addresses a challenge in computational linguistics for researchers and practitioners working with inflected languages, though it is incremental as it builds on existing unsupervised learning techniques.

The paper tackles the problem of lexical ambiguity in morphological analysis by developing an unsupervised method to disambiguate word forms into their possible morphological feature bundles, achieving results on 5 languages.

Lexical ambiguity makes it difficult to compute various useful statistics of a corpus. A given word form might represent any of several morphological feature bundles. One can, however, use unsupervised learning (as in EM) to fit a model that probabilistically disambiguates word forms. We present such an approach, which employs a neural network to smoothly model a prior distribution over feature bundles (even rare ones). Although this basic model does not consider a token's context, that very property allows it to operate on a simple list of unigram type counts, partitioning each count among different analyses of that unigram. We discuss evaluation metrics for this novel task and report results on 5 languages.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes