Distant Supervision for Relation Extraction beyond the Sentence Boundary
This addresses the need for structured knowledge in precision medicine by enabling more comprehensive relation extraction from biomedical texts, though it is incremental as it extends distant supervision to cross-sentence contexts.
The paper tackled the problem of cross-sentence relation extraction, which is under-explored in distant supervision, by proposing a graph-based approach that incorporates dependencies and discourse relations, resulting in extracting twice as many relations at similar precision compared to existing methods.
The growing demand for structured knowledge has led to great interest in relation extraction, especially in cases with limited supervision. However, existing distance supervision approaches only extract relations expressed in single sentences. In general, cross-sentence relation extraction is under-explored, even in the supervised-learning setting. In this paper, we propose the first approach for applying distant supervision to cross- sentence relation extraction. At the core of our approach is a graph representation that can incorporate both standard dependencies and discourse relations, thus providing a unifying way to model relations within and across sentences. We extract features from multiple paths in this graph, increasing accuracy and robustness when confronted with linguistic variation and analysis error. Experiments on an important extraction task for precision medicine show that our approach can learn an accurate cross-sentence extractor, using only a small existing knowledge base and unlabeled text from biomedical research articles. Compared to the existing distant supervision paradigm, our approach extracted twice as many relations at similar precision, thus demonstrating the prevalence of cross-sentence relations and the promise of our approach.