CLJul 9, 2018

Towards Better UD Parsing: Deep Contextualized Word Embeddings, Ensemble, and Treebank Concatenation

arXiv:1807.03121v31157 citations
AI Analysis

This work addresses parsing accuracy for NLP researchers and practitioners, but it is incremental as it builds directly on an existing system.

The paper tackled the problem of multilingual parsing from raw text to Universal Dependencies by extending a previous winning system with deep contextualized word embeddings, ensemble methods, and treebank concatenation, achieving a top-ranked LAS score of 75.84% in the CoNLL 2018 shared task.

This paper describes our system (HIT-SCIR) submitted to the CoNLL 2018 shared task on Multilingual Parsing from Raw Text to Universal Dependencies. We base our submission on Stanford's winning system for the CoNLL 2017 shared task and make two effective extensions: 1) incorporating deep contextualized word embeddings into both the part of speech tagger and parser; 2) ensembling parsers trained with different initialization. We also explore different ways of concatenating treebanks for further improvements. Experimental results on the development data show the effectiveness of our methods. In the final evaluation, our system was ranked first according to LAS (75.84%) and outperformed the other systems by a large margin.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes