Crowdsourcing Semantic Label Propagation in Relation Classification
This work addresses noisy labeling in relation extraction for NLP researchers, but it is incremental as it builds on existing crowdsourcing methods.
The paper tackled the problem of noisy labels in distant-supervised relation extraction by propagating human annotations using the CrowdTruth methodology, which captures ambiguity through inter-annotator disagreement, resulting in a two-order-of-magnitude expansion of labels and significant improvement in a sentence-level multi-class relation classifier.
Distant supervision is a popular method for performing relation extraction from text that is known to produce noisy labels. Most progress in relation extraction and classification has been made with crowdsourced corrections to distant-supervised labels, and there is evidence that indicates still more would be better. In this paper, we explore the problem of propagating human annotation signals gathered for open-domain relation classification through the CrowdTruth methodology for crowdsourcing, that captures ambiguity in annotations by measuring inter-annotator disagreement. Our approach propagates annotations to sentences that are similar in a low dimensional embedding space, expanding the number of labels by two orders of magnitude. Our experiments show significant improvement in a sentence-level multi-class relation classifier.