CLApr 24

SHAPE: Unifying Safety, Helpfulness and Pedagogy for Educational LLMs

Sihang, Zhao, Kangrui Yu, Youliang Yuan, Pinjia He, Hongyi Wen

arXiv:2604.2213453.0h-index: 7Has Code

AI Analysis

For developers of educational LLMs, this work addresses the critical problem of pedagogical jailbreaks, offering a benchmark and method to improve safety without sacrificing helpfulness.

The paper identifies pedagogical jailbreaks in educational LLMs, where students elicit answers instead of scaffolded instruction. It introduces SHAPE, a benchmark of 9,087 student-question pairs, and a graph-augmented tutoring pipeline that improves safety under adversarial pressure while maintaining high helpfulness.

Large Language Models (LLMs) have been widely explored in educational scenarios. We identify a critical vulnerability in current educational LLMs, pedagogical jailbreaks, where students use answer-inducing prompts to elicit solutions rather than scaffolded instructions. To enable systematic study, we unify and formalize safe, helpful, and pedagogical behaviors with a knowledge-mastery graph and introduce SHAPE, a benchmark of 9,087 student-question pairs for evaluating tutoring behavior under adversarial pressure. We propose a graph-augmented tutoring pipeline that infers prerequisite concepts from queries, identifies mastery gaps, and routes generation between instructing and problem-solving via explicit gating. Experiments across multiple LLMs show that our method yields significantly improved safety under two pedagogical jailbreak settings, while maintaining near-ceiling helpfulness under the same evaluation protocol. Our code and data are available at https://github.com/MAPS-research/SHaPE

View on arXiv PDF Code

Similar