Inferring Scientific Cross-Document Coreference and Hierarchy with Definition-Augmented Relational Reasoning
This addresses the fundamental task of cross-document coreference and hierarchy for applications like knowledge graph construction and search in scientific domains, representing an incremental improvement through a novel method.
The paper tackles the problem of inferring cross-document coreference and hierarchy in scientific texts, where LLMs struggle with long-tail technical concepts, by generating context-dependent definitions and relational definitions to enhance detection, achieving large performance gains in challenging subsets with high ambiguity and different surface forms.
We address the fundamental task of inferring cross-document coreference and hierarchy in scientific texts, which has important applications in knowledge graph construction, search, recommendation and discovery. Large Language Models (LLMs) can struggle when faced with many long-tail technical concepts with nuanced variations. We present a novel method which generates context-dependent definitions of concept mentions by retrieving full-text literature, and uses the definitions to enhance detection of cross-document relations. We further generate relational definitions, which describe how two concept mentions are related or different, and design an efficient re-ranking approach to address the combinatorial explosion involved in inferring links across papers. In both fine-tuning and in-context learning settings, we achieve large gains in performance on data subsets with high amount of different surfaces forms and ambiguity, that are challenging for models. We provide analysis of generated definitions, shedding light on the relational reasoning ability of LLMs over fine-grained scientific concepts.