CLCYOct 4, 2023

Can Language Models Employ the Socratic Method? Experiments with Code Debugging

arXiv:2310.03210v134 citationsh-index: 9Has Code
Originality Synthesis-oriented
AI Analysis

This work addresses the need for scalable automated tutoring in programming education, though it is incremental as it focuses on dataset creation and benchmarking rather than novel method development.

The paper tackles the problem of automating Socratic teaching for code debugging by introducing a manually created dataset of multi-turn Socratic advice, and benchmarks language models like Flan-T5 and GPT-4 on this task, showing that models can provide guidance but with limited effectiveness (e.g., GPT-4 achieves moderate accuracy in zero-shot settings).

When employing the Socratic method of teaching, instructors guide students toward solving a problem on their own rather than providing the solution directly. While this strategy can substantially improve learning outcomes, it is usually time-consuming and cognitively demanding. Automated Socratic conversational agents can augment human instruction and provide the necessary scale, however their development is hampered by the lack of suitable data for training and evaluation. In this paper, we introduce a manually created dataset of multi-turn Socratic advice that is aimed at helping a novice programmer fix buggy solutions to simple computational problems. The dataset is then used for benchmarking the Socratic debugging abilities of a number of language models, ranging from fine-tuning the instruction-based text-to-text transformer Flan-T5 to zero-shot and chain of thought prompting of the much larger GPT-4. The code and datasets are made freely available for research at the link below. https://github.com/taisazero/socratic-debugging-benchmark

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes