CLJun 10, 2018

Unsupervised Disambiguation of Syncretism in Inflected Lexicons

Ryan Cotterell, Christo Kirov, Sabrina J. Mielke, Jason Eisner

arXiv:1806.03740v232.01093 citations

Originality Incremental advance

AI Analysis

This addresses a challenge in computational linguistics for researchers and practitioners working with inflected languages, though it is incremental as it builds on existing unsupervised learning techniques.

The paper tackles the problem of lexical ambiguity in morphological analysis by developing an unsupervised method to disambiguate word forms into their possible morphological feature bundles, achieving results on 5 languages.

Lexical ambiguity makes it difficult to compute various useful statistics of a corpus. A given word form might represent any of several morphological feature bundles. One can, however, use unsupervised learning (as in EM) to fit a model that probabilistically disambiguates word forms. We present such an approach, which employs a neural network to smoothly model a prior distribution over feature bundles (even rare ones). Although this basic model does not consider a token's context, that very property allows it to operate on a simple list of unigram type counts, partitioning each count among different analyses of that unigram. We discuss evaluation metrics for this novel task and report results on 5 languages.

View on arXiv PDF

Similar