CLLGOct 1, 2019

Specializing Word Embeddings (for Parsing) by Information Bottleneck

arXiv:1910.00163v11042 citations
Originality Incremental advance
AI Analysis

This work addresses parsing efficiency and accuracy for NLP practitioners, but it is incremental as it builds on existing embedding methods.

The paper tackled the problem of compressing pre-trained word embeddings like ELMo and BERT to improve parsing accuracy by using a variational information bottleneck method, resulting in more accurate parsers in 8 out of 9 languages and tags that capture most traditional POS information.

Pre-trained word embeddings like ELMo and BERT contain rich syntactic and semantic information, resulting in state-of-the-art performance on various tasks. We propose a very fast variational information bottleneck (VIB) method to nonlinearly compress these embeddings, keeping only the information that helps a discriminative parser. We compress each word embedding to either a discrete tag or a continuous vector. In the discrete version, our automatically compressed tags form an alternative tag set: we show experimentally that our tags capture most of the information in traditional POS tag annotations, but our tag sequences can be parsed more accurately at the same level of tag granularity. In the continuous version, we show experimentally that moderately compressing the word embeddings by our method yields a more accurate parser in 8 of 9 languages, unlike simple dimensionality reduction.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes