CLApr 11, 2018

Evaluating Word Embedding Hyper-Parameters for Similarity and Analogy Tasks

Maryam Fanaeepour, Adam Makarucha, Jey Han Lau

arXiv:1804.04211v10.31 citationsh-index: 30

Originality Synthesis-oriented

AI Analysis

This addresses a methodological gap for NLP researchers, but it is incremental as it builds on existing evaluation frameworks.

The study tackled the problem of poorly understood hyper-parameter impacts on word embeddings by empirically measuring how vector dimensions and corpus size affect quality, finding specific effects on similarity and analogy tasks.

The versatility of word embeddings for various applications is attracting researchers from various fields. However, the impact of hyper-parameters when training embedding model is often poorly understood. How much do hyper-parameters such as vector dimensions and corpus size affect the quality of embeddings, and how do these results translate to downstream applications? Using standard embedding evaluation metrics and datasets, we conduct a study to empirically measure the impact of these hyper-parameters.

View on arXiv PDF

Similar