CL AIMay 14, 2018

A La Carte Embedding: Cheap but Effective Induction of Semantic Feature Vectors

Mikhail Khodak, Nikunj Saunshi, Yingyu Liang, Tengyu Ma, Brandon Stewart, Sanjeev Arora

arXiv:1805.05388v132.31120 citationsHas Code

Originality Incremental advance

AI Analysis

This addresses the need for efficient and effective embedding induction in natural language processing, particularly for domain adaptation and transfer learning, though it is incremental as it builds on existing theoretical results.

The paper tackles the problem of inducing embeddings for rare or unseen textual features by introducing a la carte embedding, a method based on a linear transformation using pretrained word vectors, which achieves state-of-the-art results on a nonce task and unsupervised document classification tasks with fewer context examples.

Motivations like domain adaptation, transfer learning, and feature learning have fueled interest in inducing embeddings for rare or unseen words, n-grams, synsets, and other textual features. This paper introduces a la carte embedding, a simple and general alternative to the usual word2vec-based approaches for building such representations that is based upon recent theoretical results for GloVe-like embeddings. Our method relies mainly on a linear transformation that is efficiently learnable using pretrained word vectors and linear regression. This transform is applicable on the fly in the future when a new text feature or rare word is encountered, even if only a single usage example is available. We introduce a new dataset showing how the a la carte method requires fewer examples of words in context to learn high-quality embeddings and we obtain state-of-the-art results on a nonce task and some unsupervised document classification tasks.

View on arXiv PDF Code

Similar