CLMay 2, 2020

Treebank Embedding Vectors for Out-of-domain Dependency Parsing

arXiv:2005.00800v11000 citations
Originality Incremental advance
AI Analysis

This addresses the challenge of adapting dependency parsers to new domains, offering an incremental improvement over existing methods.

The paper tackled the problem of out-of-domain dependency parsing by extending treebank embedding vectors to predict vectors for sentences not from training treebanks and exploring interpolated vectors, showing that interpolated vectors outperform predefined ones and prediction matches oracle performance for nine out of ten languages.

A recent advance in monolingual dependency parsing is the idea of a treebank embedding vector, which allows all treebanks for a particular language to be used as training data while at the same time allowing the model to prefer training data from one treebank over others and to select the preferred treebank at test time. We build on this idea by 1) introducing a method to predict a treebank vector for sentences that do not come from a treebank used in training, and 2) exploring what happens when we move away from predefined treebank embedding vectors during test time and instead devise tailored interpolations. We show that 1) there are interpolated vectors that are superior to the predefined ones, and 2) treebank vectors can be predicted with sufficient accuracy, for nine out of ten test languages, to match the performance of an oracle approach that knows the most suitable predefined treebank embedding for the test set.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes