Document-Level $N$-ary Relation Extraction with Multiscale Representation Learning
This addresses the need for high-recall and high-precision extraction of complex relations in domains like precision oncology, where existing methods are limited to small text spans.
The paper tackles the problem of extracting n-ary relations from entire documents, where entity mentions are far apart, by proposing a multiscale neural architecture that combines representations across text spans and subrelation hierarchies. The result is a system that substantially outperforms previous methods in biomedical machine reading experiments.
Most information extraction methods focus on binary relations expressed within single sentences. In high-value domains, however, $n$-ary relations are of great demand (e.g., drug-gene-mutation interactions in precision oncology). Such relations often involve entity mentions that are far apart in the document, yet existing work on cross-sentence relation extraction is generally confined to small text spans (e.g., three consecutive sentences), which severely limits recall. In this paper, we propose a novel multiscale neural architecture for document-level $n$-ary relation extraction. Our system combines representations learned over various text spans throughout the document and across the subrelation hierarchy. Widening the system's purview to the entire document maximizes potential recall. Moreover, by integrating weak signals across the document, multiscale modeling increases precision, even in the presence of noisy labels from distant supervision. Experiments on biomedical machine reading show that our approach substantially outperforms previous $n$-ary relation extraction methods.