CLAug 29, 2018

Distant Supervision from Disparate Sources for Low-Resource Part-of-Speech Tagging

arXiv:1808.09733v11106 citations
Originality Highly original
AI Analysis

This addresses the challenge of limited annotated data for part-of-speech tagging in hundreds of low-resource languages, representing a significant advancement in natural language processing for under-resourced linguistic communities.

The paper tackles the problem of part-of-speech tagging for low-resource languages by developing a cross-lingual neural tagger that learns from disparate sources of distant supervision, achieving a new state of the art without using any gold annotated data.

We introduce DsDs: a cross-lingual neural part-of-speech tagger that learns from disparate sources of distant supervision, and realistically scales to hundreds of low-resource languages. The model exploits annotation projection, instance selection, tag dictionaries, morphological lexicons, and distributed representations, all in a uniform framework. The approach is simple, yet surprisingly effective, resulting in a new state of the art without access to any gold annotated data.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes