CL LG MLApr 1, 2018

Revisiting Skip-Gram Negative Sampling Model with Rectification

arXiv:1804.00306v21.713 citations

Originality Synthesis-oriented

AI Analysis

This addresses a fundamental problem in word embedding models for natural language processing, but it is incremental as it modifies an existing method.

The paper tackled the ambiguity issue in skip-ram negative sampling (SGNS) that distorts word vectors without affecting the objective, by rectifying it with quadratic regularization to structure solutions properly, showing preliminary experimental support on Google's analytical reasoning task.

We revisit skip-gram negative sampling (SGNS), one of the most popular neural-network based approaches to learning distributed word representation. We first point out the ambiguity issue undermining the SGNS model, in the sense that the word vectors can be entirely distorted without changing the objective value. To resolve the issue, we investigate the intrinsic structures in solution that a good word embedding model should deliver. Motivated by this, we rectify the SGNS model with quadratic regularization, and show that this simple modification suffices to structure the solution in the desired manner. A theoretical justification is presented, which provides novel insights into quadratic regularization . Preliminary experiments are also conducted on Google's analytical reasoning task to support the modified SGNS model.

View on arXiv PDF

Similar