CLLGNov 24, 2019

Causally Denoise Word Embeddings Using Half-Sibling Regression

arXiv:1911.10524v11 citations
Originality Incremental advance
AI Analysis

This work addresses the need for more interpretable and transparent word vector improvements in natural language processing, though it is incremental as part of a growing trend in postprocessing algorithms.

The paper tackles the problem of noise in word embeddings by introducing a postprocessing method using Half-Sibling Regression under a causal inference framework, achieving state-of-the-art performance on lexical-level and sentiment analysis tasks.

Distributional representations of words, also known as word vectors, have become crucial for modern natural language processing tasks due to their wide applications. Recently, a growing body of word vector postprocessing algorithm has emerged, aiming to render off-the-shelf word vectors even stronger. In line with these investigations, we introduce a novel word vector postprocessing scheme under a causal inference framework. Concretely, the postprocessing pipeline is realized by Half-Sibling Regression (HSR), which allows us to identify and remove confounding noise contained in word vectors. Compared to previous work, our proposed method has the advantages of interpretability and transparency due to its causal inference grounding. Evaluated on a battery of standard lexical-level evaluation tasks and downstream sentiment analysis tasks, our method reaches state-of-the-art performance.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes