CLMar 17, 2017

Construction of a Japanese Word Similarity Dataset

arXiv:1703.05916v220 citations
AI Analysis

This addresses a resource gap for researchers working on Japanese NLP, though it is incremental as it adapts an existing evaluation method to a new language.

The authors tackled the lack of resources for evaluating distributed word representations in Japanese by constructing the first Japanese word similarity dataset, which includes various parts of speech and rare words.

An evaluation of distributed word representation is generally conducted using a word similarity task and/or a word analogy task. There are many datasets readily available for these tasks in English. However, evaluating distributed representation in languages that do not have such resources (e.g., Japanese) is difficult. Therefore, as a first step toward evaluating distributed representations in Japanese, we constructed a Japanese word similarity dataset. To the best of our knowledge, our dataset is the first resource that can be used to evaluate distributed representations in Japanese. Moreover, our dataset contains various parts of speech and includes rare words in addition to common words.

Code Implementations2 repos
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes