CLJun 10, 2016

Bootstrapping Distantly Supervised IE using Joint Learning and Small Well-structured Corpora

arXiv:1606.03398v25 citations
AI Analysis

This work addresses the challenge of noisy data in distantly-supervised information extraction, offering a method to enhance relation extraction performance for applications like knowledge base construction, though it is incremental in nature.

The paper tackles the problem of improving distantly-supervised relation extraction by jointly learning concept-instance extraction and relation extraction, leveraging small well-structured corpora for high-precision seeds and label propagation on a large unstructured corpus, resulting in significant improvements over state-of-the-art approaches.

We propose a framework to improve performance of distantly-supervised relation extraction, by jointly learning to solve two related tasks: concept-instance extraction and relation extraction. We combine this with a novel use of document structure: in some small, well-structured corpora, sections can be identified that correspond to relation arguments, and distantly-labeled examples from such sections tend to have good precision. Using these as seeds we extract additional relation examples by applying label propagation on a graph composed of noisy examples extracted from a large unstructured testing corpus. Combined with the soft constraint that concept examples should have the same type as the second argument of the relation, we get significant improvements over several state-of-the-art approaches to distantly-supervised relation extraction.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes