CLFeb 4, 2016

Many Languages, One Parser

Waleed Ammar, George Mulcaire, Miguel Ballesteros, Chris Dyer, Noah A. Smith

arXiv:1602.01595v426.2232 citationsHas Code

Originality Incremental advance

AI Analysis

This work addresses the problem of parsing multiple languages efficiently, especially for low-resource scenarios, though it is incremental in combining existing techniques.

The authors tackled multilingual dependency parsing by training a single model that uses multilingual word clusters, token-level language information, and language-specific features, achieving performance comparable to strong baselines across languages with varying amounts of training data.

We train one multilingual model for dependency parsing and use it to parse sentences in several languages. The parsing model uses (i) multilingual word clusters and embeddings; (ii) token-level language information; and (iii) language-specific features (fine-grained POS tags). This input representation enables the parser not only to parse effectively in multiple languages, but also to generalize across languages based on linguistic universals and typological similarities, making it more effective to learn from limited annotations. Our parser's performance compares favorably to strong baselines in a range of data scenarios, including when the target language has a large treebank, a small treebank, or no treebank for training.

View on arXiv PDF Code

Similar