CLLGMLSep 18, 2015

Word, graph and manifold embedding from Markov processes

arXiv:1509.05808v110 citations
Originality Incremental advance
AI Analysis

This work addresses the theoretical and practical understanding of embeddings for researchers in machine learning and cognitive science, offering a unified framework that is incremental but extends to new domains.

The paper tackles the problem of understanding and generalizing word embeddings by grounding them in cognitive-psychometric semantic spaces and unifying existing algorithms as consistent metric recovery methods from Markov random walks, proposing a new algorithm and extending it to graphs and manifolds, with comparisons across tasks including nonlinear dimensionality reduction and semantic language tasks.

Continuous vector representations of words and objects appear to carry surprisingly rich semantic content. In this paper, we advance both the conceptual and theoretical understanding of word embeddings in three ways. First, we ground embeddings in semantic spaces studied in cognitive-psychometric literature and introduce new evaluation tasks. Second, in contrast to prior work, we take metric recovery as the key object of study, unify existing algorithms as consistent metric recovery methods based on co-occurrence counts from simple Markov random walks, and propose a new recovery algorithm. Third, we generalize metric recovery to graphs and manifolds, relating co-occurence counts on random walks in graphs and random processes on manifolds to the underlying metric to be recovered, thereby reconciling manifold estimation and embedding algorithms. We compare embedding algorithms across a range of tasks, from nonlinear dimensionality reduction to three semantic language tasks, including analogies, sequence completion, and classification.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes