IR AINov 5, 2025

Discourse-Aware Scientific Paper Recommendation via QA-Style Summarization and Multi-Level Contrastive Learning

arXiv:2511.03330v13.6h-index: 3

Originality Highly original

AI Analysis

This work addresses the challenge of identifying relevant scientific papers for researchers and users in open-access environments, offering an incremental improvement through a structured and interpretable approach.

The paper tackles the problem of content-based scientific paper recommendation by addressing the neglect of discourse organization in existing models, proposing a hierarchical framework that integrates QA-style summarization and multi-level contrastive learning, resulting in up to 7.2% and 3.8% improvements in Precision@10 and Recall@10 over state-of-the-art baselines.

The rapid growth of open-access (OA) publications has intensified the challenge of identifying relevant scientific papers. Due to privacy constraints and limited access to user interaction data, recent efforts have shifted toward content-based recommendation, which relies solely on textual information. However, existing models typically treat papers as unstructured text, neglecting their discourse organization and thereby limiting semantic completeness and interpretability. To address these limitations, we propose OMRC-MR, a hierarchical framework that integrates QA-style OMRC (Objective, Method, Result, Conclusion) summarization, multi-level contrastive learning, and structure-aware re-ranking for scholarly recommendation. The QA-style summarization module converts raw papers into structured and discourse-consistent representations, while multi-level contrastive objectives align semantic representations across metadata, section, and document levels. The final re-ranking stage further refines retrieval precision through contextual similarity calibration. Experiments on DBLP, S2ORC, and the newly constructed Sci-OMRC dataset demonstrate that OMRC-MR consistently surpasses state-of-the-art baselines, achieving up to 7.2% and 3.8% improvements in Precision@10 and Recall@10, respectively. Additional evaluations confirm that QA-style summarization produces more coherent and factually complete representations. Overall, OMRC-MR provides a unified and interpretable content-based paradigm for scientific paper recommendation, advancing trustworthy and privacy-aware scholarly information retrieval.

View on arXiv PDF

Similar