CLSep 3, 2015

Encoding Prior Knowledge with Eigenword Embeddings

arXiv:1509.01007v331 citations
Originality Synthesis-oriented
AI Analysis

This work addresses the challenge of improving word embeddings for natural language processing tasks, but it appears incremental as it builds on existing CCA-based methods.

The authors tackled the problem of incorporating prior knowledge into canonical correlation analysis (CCA) for word embeddings, resulting in a method that was tested on multiple datasets.

Canonical correlation analysis (CCA) is a method for reducing the dimension of data represented using two views. It has been previously used to derive word embeddings, where one view indicates a word, and the other view indicates its context. We describe a way to incorporate prior knowledge into CCA, give a theoretical justification for it, and test it by deriving word embeddings and evaluating them on a myriad of datasets.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes