CLAIHCSep 24, 2021

Rethinking Crowd Sourcing for Semantic Similarity

arXiv:2109.11969v12 citations
Originality Synthesis-oriented
AI Analysis

This addresses reliability issues in NLP tasks that depend on human annotations, but it is incremental as it focuses on improving existing labeling methods.

The paper tackles the problem of ambiguity in crowd-sourced semantic similarity labeling by showing that annotators using binary categorization are key, and it provides heuristics to filter unreliable annotators.

Estimation of semantic similarity is crucial for a variety of natural language processing (NLP) tasks. In the absence of a general theory of semantic information, many papers rely on human annotators as the source of ground truth for semantic similarity estimation. This paper investigates the ambiguities inherent in crowd-sourced semantic labeling. It shows that annotators that treat semantic similarity as a binary category (two sentences are either similar or not similar and there is no middle ground) play the most important role in the labeling. The paper offers heuristics to filter out unreliable annotators and stimulates further discussions on human perception of semantic similarity.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes