Rethinking Crowd Sourcing for Semantic Similarity
This addresses reliability issues in NLP tasks that depend on human annotations, but it is incremental as it focuses on improving existing labeling methods.
The paper tackles the problem of ambiguity in crowd-sourced semantic similarity labeling by showing that annotators using binary categorization are key, and it provides heuristics to filter unreliable annotators.
Estimation of semantic similarity is crucial for a variety of natural language processing (NLP) tasks. In the absence of a general theory of semantic information, many papers rely on human annotators as the source of ground truth for semantic similarity estimation. This paper investigates the ambiguities inherent in crowd-sourced semantic labeling. It shows that annotators that treat semantic similarity as a binary category (two sentences are either similar or not similar and there is no middle ground) play the most important role in the labeling. The paper offers heuristics to filter out unreliable annotators and stimulates further discussions on human perception of semantic similarity.