CLFeb 25

Improving Parametric Knowledge Access in Reasoning Language Models

arXiv:2602.22193v14 citationsh-index: 3

Originality Incremental advance

AI Analysis

This addresses the issue of under-optimized knowledge access in reasoning models for AI and NLP applications, but it is incremental as it builds on existing reinforcement learning methods.

The paper tackles the problem of reasoning language models not effectively accessing their own stored world knowledge, and finds that training them with reinforcement learning on world-knowledge question answering improves performance, with gains such as +9.9% on TriviaQA and up to 4.2% on other datasets.

We study reasoning for accessing world knowledge stored in a language model's parameters. For example, recalling that Canberra is Australia's capital may benefit from thinking through major cities and the concept of purpose-built capitals. While reasoning language models are trained via reinforcement learning to produce reasoning traces on tasks such as mathematics, they may not reason well for accessing their own world knowledge. We first find that models do not generate their best world knowledge reasoning by default: adding a simple "think step-by-step" cue demonstrates statistically significant improvement in knowledge recall but not math. Motivated by this, we propose training models to reason over their parametric knowledge using world-knowledge question answering as a verifiable reward. After reinforcement learning on TriviaQA (+9.9%), performance also improves on Natural Questions, HotpotQA, SimpleQA, and StrategyQA by 4.2%, 2.1%, 0.6%, and 3.0%, respectively. Reasoning models are under-optimized for parametric knowledge access, but can be easily trained to reason better.

View on arXiv PDF

Similar