CL LGJul 7, 2014

WordRep: A Benchmark for Research on Learning Word Representations

arXiv:1407.1640v141 citations

Originality Synthesis-oriented

AI Analysis

This provides a standardized tool for researchers in natural language processing to evaluate and compare word embedding algorithms, though it is incremental as it builds on existing benchmark practices.

The authors introduced WordRep, a benchmark collection for evaluating distributed word representations, and compared several state-of-the-art methods on it, reporting their performance metrics.

WordRep is a benchmark collection for the research on learning distributed word representations (or word embeddings), released by Microsoft Research. In this paper, we describe the details of the WordRep collection and show how to use it in different types of machine learning research related to word embedding. Specifically, we describe how the evaluation tasks in WordRep are selected, how the data are sampled, and how the evaluation tool is built. We then compare several state-of-the-art word representations on WordRep, report their evaluation performance, and make discussions on the results. After that, we discuss new potential research topics that can be supported by WordRep, in addition to algorithm comparison. We hope that this paper can help people gain deeper understanding of WordRep, and enable more interesting research on learning distributed word representations and related topics.

View on arXiv PDF

Similar