CLAug 20, 2017

Portuguese Word Embeddings: Evaluating on Word Analogies and Natural Language Tasks

arXiv:1708.06025v11111 citations
Originality Synthesis-oriented
AI Analysis

This work addresses the evaluation of word embeddings for Portuguese NLP, but it is incremental as it applies existing methods to a new language variant.

The paper evaluated 31 word embedding models trained on a large Portuguese corpus using FastText, GloVe, Wang2Vec, and Word2Vec, finding that word analogies are not appropriate for evaluation, while task-specific evaluations like POS tagging and sentence semantic similarity are better.

Word embeddings have been found to provide meaningful representations for words in an efficient way; therefore, they have become common in Natural Language Processing sys- tems. In this paper, we evaluated different word embedding models trained on a large Portuguese corpus, including both Brazilian and European variants. We trained 31 word embedding models using FastText, GloVe, Wang2Vec and Word2Vec. We evaluated them intrinsically on syntactic and semantic analogies and extrinsically on POS tagging and sentence semantic similarity tasks. The obtained results suggest that word analogies are not appropriate for word embedding evaluation; task-specific evaluations appear to be a better option.

Code Implementations3 repos
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes