CLNEMar 13, 2023

A Comprehensive Empirical Evaluation of Existing Word Embedding Approaches

arXiv:2303.07196v25 citationsh-index: 23
AI Analysis

This work provides a comparative analysis for NLP researchers and practitioners, but it is incremental as it synthesizes existing methods without introducing new techniques.

The paper empirically evaluates traditional matrix factorization and neural-network-based word embedding approaches on multiple classification tasks, finding that neural methods better capture semantic and syntactic regularities and outperform traditional ones in specific scenarios.

Vector-based word representations help countless Natural Language Processing (NLP) tasks capture the language's semantic and syntactic regularities. In this paper, we present the characteristics of existing word embedding approaches and analyze them with regard to many classification tasks. We categorize the methods into two main groups - Traditional approaches mostly use matrix factorization to produce word representations, and they are not able to capture the semantic and syntactic regularities of the language very well. On the other hand, Neural-network-based approaches can capture sophisticated regularities of the language and preserve the word relationships in the generated word representations. We report experimental results on multiple classification tasks and highlight the scenarios where one approach performs better than the rest.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes