CLApr 15, 2018

Introducing two Vietnamese Datasets for Evaluating Semantic Models of (Dis-)Similarity and Relatedness

arXiv:1804.05388v21089 citations
Originality Synthesis-oriented
AI Analysis

This provides essential evaluation tools for semantic models in a low-resource language, though it is incremental as it adapts existing dataset concepts to Vietnamese.

The authors tackled the lack of semantic evaluation resources for Vietnamese by creating two datasets, ViCon and ViSim-400, which assess similarity and relatedness, showing results comparable to English benchmarks.

We present two novel datasets for the low-resource language Vietnamese to assess models of semantic similarity: ViCon comprises pairs of synonyms and antonyms across word classes, thus offering data to distinguish between similarity and dissimilarity. ViSim-400 provides degrees of similarity across five semantic relations, as rated by human judges. The two datasets are verified through standard co-occurrence and neural network models, showing results comparable to the respective English datasets.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes