IRJan 4, 2021

Coreference Resolution in Research Papers from Multiple Domains

Arthur Brack, Daniel Uwe Müller, Anett Hoppe, Ralph Ewerth

arXiv:2101.00884v17.515 citationsHas Code

Originality Highly original

AI Analysis

This work is significant for researchers and practitioners in natural language processing and information retrieval who need to accurately process and extract information from scientific literature, providing a strong specific gain in coreference resolution for this domain.

This paper addresses the decline in coreference resolution performance when state-of-the-art approaches are applied to scientific papers. The authors propose a transfer learning approach that achieves an F1 score of 61.4 (+11.0) on their new corpus and significantly improves knowledge graph population quality with an F1 score of 63.5 (+21.8) against a gold standard KG.

Coreference resolution is essential for automatic text understanding to facilitate high-level information retrieval tasks such as text summarisation or question answering. Previous work indicates that the performance of state-of-the-art approaches (e.g. based on BERT) noticeably declines when applied to scientific papers. In this paper, we investigate the task of coreference resolution in research papers and subsequent knowledge graph population. We present the following contributions: (1) We annotate a corpus for coreference resolution that comprises 10 different scientific disciplines from Science, Technology, and Medicine (STM); (2) We propose transfer learning for automatic coreference resolution in research papers; (3) We analyse the impact of coreference resolution on knowledge graph (KG) population; (4) We release a research KG that is automatically populated from 55,485 papers in 10 STM domains. Comprehensive experiments show the usefulness of the proposed approach. Our transfer learning approach considerably outperforms state-of-the-art baselines on our corpus with an F1 score of 61.4 (+11.0), while the evaluation against a gold standard KG shows that coreference resolution improves the quality of the populated KG significantly with an F1 score of 63.5 (+21.8).

View on arXiv PDF Code

Similar