Second-order Co-occurrence Sensitivity of Skip-Gram with Negative Sampling
This work addresses a fundamental property of word embedding models, providing insights into their effectiveness across various tasks, though it is incremental in nature.
The study demonstrated that Skip-Gram with Negative Sampling captures second-order co-occurrence information similarly to Singular Value Decomposition, unlike Pointwise Mutual Information, through simulations and empirical evidence showing differential model reactions to additional second-order data.
We simulate first- and second-order context overlap and show that Skip-Gram with Negative Sampling is similar to Singular Value Decomposition in capturing second-order co-occurrence information, while Pointwise Mutual Information is agnostic to it. We support the results with an empirical study finding that the models react differently when provided with additional second-order information. Our findings reveal a basic property of Skip-Gram with Negative Sampling and point towards an explanation of its success on a variety of tasks.