CL AIMay 21, 2025

From Problem-Solving to Teaching Problem-Solving: Aligning LLMs with Pedagogy using Reinforcement Learning

David Dinucu-Jianu, Jakub Macina, Nico Daheim, Ido Hakimi, Iryna Gurevych, Mrinmaya Sachan

ETH Zurich

arXiv:2505.15607v219.410 citationsh-index: 65Has Code

Originality Incremental advance

AI Analysis

This work addresses the challenge of making LLMs more pedagogically effective for educational applications, representing an incremental improvement by adapting existing RL methods to a specific domain.

The paper tackles the problem of aligning large language models (LLMs) with effective pedagogy by optimizing them to act as tutors that guide problem-solving rather than directly answering questions, using an online reinforcement learning framework; the result is a 7B parameter tutor model that achieves similar performance to larger proprietary models like LearnLM, with controllable reward weighting to balance pedagogical support and student accuracy.

Large language models (LLMs) can transform education, but their optimization for direct question-answering often undermines effective pedagogy which requires strategically withholding answers. To mitigate this, we propose an online reinforcement learning (RL)-based alignment framework that can quickly adapt LLMs into effective tutors using simulated student-tutor interactions by emphasizing pedagogical quality and guided problem-solving over simply giving away answers. We use our method to train a 7B parameter tutor model without human annotations which reaches similar performance to larger proprietary models like LearnLM. We introduce a controllable reward weighting to balance pedagogical support and student solving accuracy, allowing us to trace the Pareto frontier between these two objectives. Our models better preserve reasoning capabilities than single-turn SFT baselines and can optionally enhance interpretability through thinking tags that expose the model's instructional planning.

View on arXiv PDF Code

Similar