CLFeb 24, 2024

MATHWELL: Generating Educational Math Word Problems Using Teacher Annotations

arXiv:2402.15861v527 citationsh-index: 15EMNLP
Originality Incremental advance
AI Analysis

This addresses the time-consuming task of creating educational math problems for K-8 teachers and students, though it is incremental as it builds on existing language models with expert feedback.

The paper tackles the challenge of automatically generating educational math word problems for K-8 students by using teacher annotations to finetune a 70B language model, resulting in MATHWELL, which outperforms public models in solvability, accuracy, and appropriateness and matches GPT-4's quality with better reading levels and safety.

Math word problems are critical K-8 educational tools, but writing them is time consuming and requires extensive expertise. To be educational, problems must be solvable, have accurate answers, and, most importantly, be educationally appropriate. We propose that language models have potential to support K-8 math education by automatically generating word problems. However, evaluating educational appropriateness is hard to quantify. We fill this gap by having teachers evaluate problems generated by LLMs, who find existing models and data often fail to be educationally appropriate. We then explore automatically generating educational word problems, ultimately using our expert annotations to finetune a 70B language model. Our model, MATHWELL, is the first K-8 word problem generator targeted at educational appropriateness. Further expert studies find MATHWELL generates problems far more solvable, accurate, and appropriate than public models. MATHWELL also matches GPT-4's problem quality while attaining more appropriate reading levels for K-8 students and avoiding generating harmful questions.

Code Implementations2 repos
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes