CLAug 29, 2018

Distant Supervision from Disparate Sources for Low-Resource Part-of-Speech Tagging

arXiv:1808.09733v132.21106 citationsHas Code

Originality Highly original

AI Analysis

This addresses the challenge of limited annotated data for part-of-speech tagging in hundreds of low-resource languages, representing a significant advancement in natural language processing for under-resourced linguistic communities.

The paper tackles the problem of part-of-speech tagging for low-resource languages by developing a cross-lingual neural tagger that learns from disparate sources of distant supervision, achieving a new state of the art without using any gold annotated data.

We introduce DsDs: a cross-lingual neural part-of-speech tagger that learns from disparate sources of distant supervision, and realistically scales to hundreds of low-resource languages. The model exploits annotation projection, instance selection, tag dictionaries, morphological lexicons, and distributed representations, all in a uniform framework. The approach is simple, yet surprisingly effective, resulting in a new state of the art without access to any gold annotated data.

View on arXiv PDF Code

Similar