IR CLMay 19, 2025

CoRank: LLM-Based Compact Reranking with Document Features for Scientific Retrieval

Runchu Tian, Xueqiang Xu, Bowen Jin, SeongKu Kang, Jiawei Han

arXiv:2505.13757v26.34 citationsh-index: 16

Originality Incremental advance

AI Analysis

This work addresses retrieval performance for scientific researchers by enhancing reranking efficiency and accuracy, though it is incremental as it builds on existing reranking methods with a novel integration of features.

The paper tackled the problem of suboptimal first-stage retrieval and limited candidate coverage in LLM-based listwise reranking for scientific retrieval by proposing CoRank, a training-free framework using compact document representations, which improved average nDCG@10 from 50.6 to 55.5 across 5 datasets.

Scientific retrieval is essential for advancing scientific knowledge discovery. Within this process, document reranking plays a critical role in refining first-stage retrieval results. However, standard LLM listwise reranking faces challenges in the scientific domain. First-stage retrieval is often suboptimal in the scientific domain, so relevant documents are ranked lower. Meanwhile, conventional listwise reranking places the full text of candidates into the context window, limiting the number of candidates that can be considered. As a result, many relevant documents are excluded before reranking, constraining overall retrieval performance. To address these challenges, we explore semantic-feature-based compact document representations (e.g., categories, sections, and keywords) and propose CoRank, a training-free, model-agnostic reranking framework for scientific retrieval. It presents a three-stage solution: (i) offline extraction of document features, (ii) coarse-grained reranking using these compact representations, and (iii) fine-grained reranking on full texts of the top candidates from (ii). This integrated process addresses suboptimal first-stage retrieval: Compact representations allow more documents to fit within the context window, improving candidate set coverage, while the final fine-grained ranking ensures a more accurate ordering. Experiments on 5 academic retrieval datasets show that CoRank significantly improves reranking performance across different LLM backbones (average nDCG@10 from 50.6 to 55.5). Overall, these results underscore the synergistic interaction between information extraction and information retrieval, demonstrating how structured semantic features can enhance reranking in the scientific domain.

View on arXiv PDF

Similar