CLAILGJul 12, 2024

Stepwise Verification and Remediation of Student Reasoning Errors with Large Language Model Tutors

ETH Zurich
arXiv:2407.09136v141 citationsh-index: 40
AI Analysis

This work addresses the challenge of scaling personalized education with AI tutors, but it is incremental as it builds on existing methods for error detection in a specific domain.

The paper tackled the problem of large language models (LLMs) struggling to detect student reasoning errors and provide tailored feedback in tutoring, by developing verifiers for error detection; the result showed that these verifiers improved response generation, making it more correct and less hallucinatory compared to baselines.

Large language models (LLMs) present an opportunity to scale high-quality personalized education to all. A promising approach towards this means is to build dialog tutoring models that scaffold students' problem-solving. However, even though existing LLMs perform well in solving reasoning questions, they struggle to precisely detect student's errors and tailor their feedback to these errors. Inspired by real-world teaching practice where teachers identify student errors and customize their response based on them, we focus on verifying student solutions and show how grounding to such verification improves the overall quality of tutor response generation. We collect a dataset of 1K stepwise math reasoning chains with the first error step annotated by teachers. We show empirically that finding the mistake in a student solution is challenging for current models. We propose and evaluate several verifiers for detecting these errors. Using both automatic and human evaluation we show that the student solution verifiers steer the generation model towards highly targeted responses to student errors which are more often correct with less hallucinations compared to existing baselines.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes