CLAIMay 21, 2025

From Problem-Solving to Teaching Problem-Solving: Aligning LLMs with Pedagogy using Reinforcement Learning

ETH Zurich
arXiv:2505.15607v210 citationsh-index: 40
Originality Incremental advance
AI Analysis

This work addresses the challenge of making LLMs more pedagogically effective for educational applications, representing an incremental improvement by adapting existing RL methods to a specific domain.

The paper tackles the problem of aligning large language models (LLMs) with effective pedagogy by optimizing them to act as tutors that guide problem-solving rather than directly answering questions, using an online reinforcement learning framework; the result is a 7B parameter tutor model that achieves similar performance to larger proprietary models like LearnLM, with controllable reward weighting to balance pedagogical support and student accuracy.

Large language models (LLMs) can transform education, but their optimization for direct question-answering often undermines effective pedagogy which requires strategically withholding answers. To mitigate this, we propose an online reinforcement learning (RL)-based alignment framework that can quickly adapt LLMs into effective tutors using simulated student-tutor interactions by emphasizing pedagogical quality and guided problem-solving over simply giving away answers. We use our method to train a 7B parameter tutor model without human annotations which reaches similar performance to larger proprietary models like LearnLM. We introduce a controllable reward weighting to balance pedagogical support and student solving accuracy, allowing us to trace the Pareto frontier between these two objectives. Our models better preserve reasoning capabilities than single-turn SFT baselines and can optionally enhance interpretability through thinking tags that expose the model's instructional planning.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes