CLLGJun 1, 2020

Attention Word Embedding

arXiv:2006.00988v1991 citations
Originality Incremental advance
AI Analysis

This addresses a limitation in word embedding models for NLP practitioners, though it is incremental as it builds on existing CBOW methods.

The authors tackled the inefficiency of CBOW word embeddings by introducing Attention Word Embedding (AWE) and AWE-S, which integrate attention mechanisms and subword information, resulting in outperformance of state-of-the-art models on word similarity datasets and NLP model initialization.

Word embedding models learn semantically rich vector representations of words and are widely used to initialize natural processing language (NLP) models. The popular continuous bag-of-words (CBOW) model of word2vec learns a vector embedding by masking a given word in a sentence and then using the other words as a context to predict it. A limitation of CBOW is that it equally weights the context words when making a prediction, which is inefficient, since some words have higher predictive value than others. We tackle this inefficiency by introducing the Attention Word Embedding (AWE) model, which integrates the attention mechanism into the CBOW model. We also propose AWE-S, which incorporates subword information. We demonstrate that AWE and AWE-S outperform the state-of-the-art word embedding models both on a variety of word similarity datasets and when used for initialization of NLP models.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes