CLDec 16, 2015

Morpho-syntactic Lexicon Generation Using Graph-based Semi-supervised Learning

arXiv:1512.05030v324 citations
Originality Incremental advance
AI Analysis

This addresses the lack of comprehensive lexicons for many languages, enabling better NLP tools, though it is incremental as it builds on semi-supervised techniques.

The paper tackled the problem of generating morpho-syntactic lexicons for languages with limited resources by developing a graph-based semi-supervised learning method, achieving expansion from a 1000-word seed to over 100 times its size with high quality for 11 languages and improving performance in morphological tagging and dependency parsing.

Morpho-syntactic lexicons provide information about the morphological and syntactic roles of words in a language. Such lexicons are not available for all languages and even when available, their coverage can be limited. We present a graph-based semi-supervised learning method that uses the morphological, syntactic and semantic relations between words to automatically construct wide coverage lexicons from small seed sets. Our method is language-independent, and we show that we can expand a 1000 word seed lexicon to more than 100 times its size with high quality for 11 languages. In addition, the automatically created lexicons provide features that improve performance in two downstream tasks: morphological tagging and dependency parsing.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes