CL AI IRMay 28, 2025

Beyond path selection: Better LLMs for Scientific Information Extraction with MimicSFT and Relevance and Rule-induced(R$^2$)GRPO

Ran Li, Shimin Di, Yuchen Liu, Chen Jing, Yu Qiu, Lei Chen

arXiv:2505.22068v14.91 citationsh-index: 10Has Code

Originality Incremental advance

AI Analysis

This addresses the challenge of enhancing LLMs for scientific information extraction, which requires both reasoning and memorization, but the approach appears incremental as it builds on existing SFT and RLVR techniques.

The paper tackles the problem of scientific information extraction (SciIE) where large language models (LLMs) underperform smaller models, by proposing a two-stage training method with MimicSFT and R²GRPO to improve reasoning capacity, resulting in surpassing baseline LLMs and specialized supervised models in relation extraction.

Previous study suggest that powerful Large Language Models (LLMs) trained with Reinforcement Learning with Verifiable Rewards (RLVR) only refines reasoning path without improving the reasoning capacity in math tasks while supervised-finetuning(SFT) with distillation can. We study this from the view of Scientific information extraction (SciIE) where LLMs and reasoning LLMs underperforms small Bert-based models. SciIE require both the reasoning and memorization. We argue that both SFT and RLVR can refine the reasoning path and improve reasoning capacity in a simple way based on SciIE. We propose two-stage training with 1. MimicSFT, using structured reasoning templates without needing high-quality chain-of-thought data, 2. R$^2$GRPO with relevance and rule-induced rewards. Experiments on scientific IE benchmarks show that both methods can improve the reasoning capacity. R$^2$GRPO with mimicSFT surpasses baseline LLMs and specialized supervised models in relation extraction. Our code is available at https://github.com/ranlislz/R2GRPO.

View on arXiv PDF Code

Similar