CLMay 14, 2018

Parser Training with Heterogeneous Treebanks

Sara Stymne, Miryam de Lhoneux, Aaron Smith, Joakim Nivre

arXiv:1805.05089v132.21110 citationsHas Code

Originality Incremental advance

AI Analysis

This addresses the challenge of leveraging diverse treebanks for improved parsing accuracy in natural language processing, representing an incremental advance in parser training methods.

The paper tackled the problem of training monolingual dependency parsers using multiple heterogeneous treebanks, and found that fine-tuning and treebank embeddings led to substantial improvements with average gains of 2.0-3.5 LAS points over single treebanks or concatenation.

How to make the most of multiple heterogeneous treebanks when training a monolingual dependency parser is an open question. We start by investigating previously suggested, but little evaluated, strategies for exploiting multiple treebanks based on concatenating training sets, with or without fine-tuning. We go on to propose a new method based on treebank embeddings. We perform experiments for several languages and show that in many cases fine-tuning and treebank embeddings lead to substantial improvements over single treebanks or concatenation, with average gains of 2.0--3.5 LAS points. We argue that treebank embeddings should be preferred due to their conceptual simplicity, flexibility and extensibility.

View on arXiv PDF Code

Similar