CLLGSTMLJul 13, 2017

Learning Features from Co-occurrences: A Theoretical Analysis

arXiv:1707.04218v11089 citations
Originality Synthesis-oriented
AI Analysis

This work provides incremental theoretical insights into feature learning for natural language processing, addressing a known challenge in the field.

The paper tackles the theoretical understanding of representing words by co-occurrences with context features, analyzing how different context features and mapping functions affect performance in a word classification task, and explains why multiple context features outperform single ones.

Representing a word by its co-occurrences with other words in context is an effective way to capture the meaning of the word. However, the theory behind remains a challenge. In this work, taking the example of a word classification task, we give a theoretical analysis of the approaches that represent a word X by a function f(P(C|X)), where C is a context feature, P(C|X) is the conditional probability estimated from a text corpus, and the function f maps the co-occurrence measure to a prediction score. We investigate the impact of context feature C and the function f. We also explain the reasons why using the co-occurrences with multiple context features may be better than just using a single one. In addition, some of the results shed light on the theory of feature learning and machine learning in general.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes