CLMay 14, 2018

Parser Training with Heterogeneous Treebanks

arXiv:1805.05089v11110 citations
Originality Incremental advance
AI Analysis

This addresses the challenge of leveraging diverse treebanks for improved parsing accuracy in natural language processing, representing an incremental advance in parser training methods.

The paper tackled the problem of training monolingual dependency parsers using multiple heterogeneous treebanks, and found that fine-tuning and treebank embeddings led to substantial improvements with average gains of 2.0-3.5 LAS points over single treebanks or concatenation.

How to make the most of multiple heterogeneous treebanks when training a monolingual dependency parser is an open question. We start by investigating previously suggested, but little evaluated, strategies for exploiting multiple treebanks based on concatenating training sets, with or without fine-tuning. We go on to propose a new method based on treebank embeddings. We perform experiments for several languages and show that in many cases fine-tuning and treebank embeddings lead to substantial improvements over single treebanks or concatenation, with average gains of 2.0--3.5 LAS points. We argue that treebank embeddings should be preferred due to their conceptual simplicity, flexibility and extensibility.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes