CLIRLGSep 1, 2025

ABCD-LINK: Annotation Bootstrapping for Cross-Document Fine-Grained Links

arXiv:2509.01387v1h-index: 15
Originality Incremental advance
AI Analysis

This addresses the lack of efficient methods for dataset creation in cross-document understanding, enabling systematic study across applications such as media framing and peer review.

The paper tackles the problem of creating training and evaluation datasets for cross-document fine-grained links by introducing a domain-agnostic framework that combines retrieval models with LLMs, achieving 78% link approval from human raters and more than doubling the precision of strong retrievers alone in domains like peer review and news.

Understanding fine-grained relations between documents is crucial for many application domains. However, the study of automated assistance is limited by the lack of efficient methods to create training and evaluation datasets of cross-document links. To address this, we introduce a new domain-agnostic framework for selecting a best-performing approach and annotating cross-document links in a new domain from scratch. We first generate and validate semi-synthetic datasets of interconnected documents. This data is used to perform automatic evaluation, producing a shortlist of best-performing linking approaches. These approaches are then used in an extensive human evaluation study, yielding performance estimates on natural text pairs. We apply our framework in two distinct domains -- peer review and news -- and show that combining retrieval models with LLMs achieves 78\% link approval from human raters, more than doubling the precision of strong retrievers alone. Our framework enables systematic study of cross-document understanding across application scenarios, and the resulting novel datasets lay foundation for numerous cross-document tasks like media framing and peer review. We make the code, data, and annotation protocols openly available.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes