CL CY LGMar 1, 2024

Improving Socratic Question Generation using Data Augmentation and Preference Optimization

arXiv:2403.00199v317.734 citationsh-index: 9Has CodeBEA

Originality Incremental advance

AI Analysis

This addresses the challenge of automating Socratic questioning for student learning, though it is incremental as it builds on existing RLAIF and DPO techniques for a specific domain.

The paper tackles the problem of generating invalid Socratic questions (e.g., revealing solutions) from LLMs by proposing a data augmentation method to enrich datasets with invalid examples and using direct preference optimization (DPO) to optimize models to prefer valid questions, resulting in a DPO-optimized 7B LLama 2 model outperforming state-of-the-art prompting methods.

The Socratic method is a way of guiding students toward solving a problem independently without directly revealing the solution to the problem. Although this method has been shown to significantly improve student learning outcomes, it remains a complex labor-intensive task for instructors. Large language models (LLMs) can be used to augment human effort by automatically generating Socratic questions for students. However, existing methods that involve prompting these LLMs sometimes produce invalid outputs, e.g., those that directly reveal the solution to the problem or provide irrelevant or premature questions. To alleviate this problem, inspired by reinforcement learning with AI feedback (RLAIF), we first propose a data augmentation method to enrich existing Socratic questioning datasets with questions that are invalid in specific ways. Next, we propose a method to optimize open-source LLMs such as LLama 2 to prefer ground-truth questions over generated invalid ones, using direct preference optimization (DPO). Our experiments on a Socratic questions dataset for student code debugging show that a DPO-optimized 7B LLama 2 model can effectively avoid generating invalid questions, and as a result, outperforms existing state-of-the-art prompting methods.

View on arXiv PDF Code

Similar