CL CYOct 4, 2023

Can Language Models Employ the Socratic Method? Experiments with Code Debugging

Erfan Al-Hossami, Razvan Bunescu, Justin Smith, Ryan Teehan

arXiv:2310.03210v14.934 citationsh-index: 9Has Code

Originality Synthesis-oriented

AI Analysis

This work addresses the need for scalable automated tutoring in programming education, though it is incremental as it focuses on dataset creation and benchmarking rather than novel method development.

The paper tackles the problem of automating Socratic teaching for code debugging by introducing a manually created dataset of multi-turn Socratic advice, and benchmarks language models like Flan-T5 and GPT-4 on this task, showing that models can provide guidance but with limited effectiveness (e.g., GPT-4 achieves moderate accuracy in zero-shot settings).

When employing the Socratic method of teaching, instructors guide students toward solving a problem on their own rather than providing the solution directly. While this strategy can substantially improve learning outcomes, it is usually time-consuming and cognitively demanding. Automated Socratic conversational agents can augment human instruction and provide the necessary scale, however their development is hampered by the lack of suitable data for training and evaluation. In this paper, we introduce a manually created dataset of multi-turn Socratic advice that is aimed at helping a novice programmer fix buggy solutions to simple computational problems. The dataset is then used for benchmarking the Socratic debugging abilities of a number of language models, ranging from fine-tuning the instruction-based text-to-text transformer Flan-T5 to zero-shot and chain of thought prompting of the much larger GPT-4. The code and datasets are made freely available for research at the link below. https://github.com/taisazero/socratic-debugging-benchmark

View on arXiv PDF Code

Similar