CLFLOct 22, 2023

4 and 7-bit Labeling for Projective and Non-Projective Dependency Trees

arXiv:2310.14319v1133 citationsh-index: 4
Originality Incremental advance
AI Analysis

This work addresses dependency parsing for NLP, offering an incremental improvement in encoding efficiency and coverage.

The paper tackles dependency parsing by introducing a 4-bit encoding for projective trees and a 7-bit extension for non-projective trees, achieving substantial accuracy gains over previous sequence labeling methods on diverse treebanks.

We introduce an encoding for parsing as sequence labeling that can represent any projective dependency tree as a sequence of 4-bit labels, one per word. The bits in each word's label represent (1) whether it is a right or left dependent, (2) whether it is the outermost (left/right) dependent of its parent, (3) whether it has any left children and (4) whether it has any right children. We show that this provides an injective mapping from trees to labels that can be encoded and decoded in linear time. We then define a 7-bit extension that represents an extra plane of arcs, extending the coverage to almost full non-projectivity (over 99.9% empirical arc coverage). Results on a set of diverse treebanks show that our 7-bit encoding obtains substantial accuracy gains over the previously best-performing sequence labeling encodings.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes