LGMar 20, 2025

Exploring the Hidden Reasoning Process of Large Language Models by Misleading Them

arXiv:2503.16401v24 citationsh-index: 8EMNLP
Originality Incremental advance
AI Analysis

This addresses the problem of understanding reasoning capabilities in LLMs for AI researchers, but it is incremental as it builds on existing fine-tuning and evaluation methods.

The paper investigates whether large language models (LLMs) engage in abstract reasoning beyond memorization by using Misleading Fine-Tuning (MisFT) to teach them contradictory rules, finding that LLMs can apply these rules to solve math and reasoning tasks, suggesting an internal abstraction mechanism.

Large language models (LLMs) have been able to perform various forms of reasoning tasks in a wide range of scenarios, but are they truly engaging in task abstraction and rule-based reasoning beyond mere memorization? To answer this question, we propose a novel experimental approach, Misleading Fine-Tuning (MisFT), to examine whether LLMs perform abstract reasoning by altering their original understanding of fundamental rules. In particular, by constructing datasets with math expressions or logical formulas that contradict correct principles, we fine-tune the model to learn those contradictory rules and assess its generalization ability on unseen test domains. Through a series of experiments, we find that current LLMs are capable of applying contradictory rules to solve practical math word problems and natural language reasoning tasks, implying the presence of an internal mechanism in LLMs that abstracts before reasoning.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes