RECIPER: A Dual-View Retrieval Pipeline for Procedure-Oriented Materials Question Answering

Zhuoyu Wu, Wenhui Ou, Pei-Sze Tan, Wenqi Fang, Sailaja Rajanala, Raphaël C. -W. Phan

arXiv:2604.1122940.0h-index: 4Has Code

Predicted impact top 21% in SP · last 90 daysOriginality Incremental advance

AI Analysis

For researchers in materials science, RECIPER addresses the challenge of retrieving scattered synthesis details from long documents, offering a practical improvement over paragraph-only dense retrieval.

RECIPER is a dual-view retrieval pipeline that combines paragraph-level context with LLM-extracted procedural summaries, improving early-rank retrieval for procedure-oriented materials question answering. It achieves average gains of +3.73 in Recall@1, +2.85 in nDCG@10, and +3.13 in MRR across four dense retrieval backbones.

Retrieving procedure-oriented evidence from materials science papers is difficult because key synthesis details are often scattered across long, context-heavy documents and are not well captured by paragraph-only dense retrieval. We present RECIPER, a dual-view retrieval pipeline that indexes both paragraph-level context and compact large language model-extracted procedural summaries, then combines the two candidate streams with lightweight lexical reranking. Across four dense retrieval backbones, RECIPER consistently improves early-rank retrieval over paragraph-only dense retrieval, achieving average gains of +3.73 in Recall@1, +2.85 in nDCG@10, and +3.13 in MRR. With BGE-large-en-v1.5, it reaches 86.82%, 97.07%, and 97.85% on Recall@1, Recall@5, and Recall@10, respectively. We further observe improved downstream question answering under automatic metrics, suggesting that procedural summaries can serve as a useful complementary retrieval signal for procedure-oriented materials question answering. Code and data are available at https://github.com/ReaganWu/RECIPER.

View on arXiv PDF Code

Similar