AICLSep 29, 2025

Reasoning or Retrieval? A Study of Answer Attribution on Large Reasoning Models

arXiv:2509.24156v18 citationsh-index: 11
Originality Incremental advance
AI Analysis

This work addresses a critical limitation in fine-tuning paradigms for AI reasoning models, which is incremental as it builds on existing research to improve model reliability and generalizability.

The study investigated why large reasoning models (LRMs) often produce answers that contradict their own reasoning traces, attributing it to competing mechanisms of Chain-of-Thought reasoning and memory retrieval, and found that factors like problem domains and model scales influence their dominance, with retrieval acting as a shortcut that undermines genuine reasoning.

Large reasoning models (LRMs) exhibit unprecedented capabilities in solving complex problems through Chain-of-Thought (CoT) reasoning. However, recent studies reveal that their final answers often contradict their own reasoning traces. We hypothesize that this inconsistency stems from two competing mechanisms for generating answers: CoT reasoning and memory retrieval. To test this hypothesis, we conduct controlled experiments that challenge LRMs with misleading cues during reasoning and/or corrupted answers during retrieval. Our results across models and datasets confirm that both mechanisms operate simultaneously, with their relative dominance influenced by multiple factors: problem domains, model scales, and fine-tuning approaches (e.g., reinforcement learning vs. distillation). The findings reveal a critical limitation in current reasoning fine-tuning paradigms: models can exploit the retrieval mechanism as a shortcut, effectively "hacking" the reward signal and undermining genuine reasoning development. To address this challenge, we introduce FARL, a novel fine-tuning framework that integrates memory unlearning with reinforcement learning. By carefully suppressing retrieval shortcuts during the fine-tuning process, FARL promotes reasoning-dominant behavior and enhances generalizable reasoning capabilities.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes