QACP: An Annotated Question Answering Dataset for Assisting Chinese Python Programming Learners
This addresses the problem of high human cost in handling student queries for Chinese Python learners, but it is incremental as it focuses on dataset creation without introducing new methods.
The paper tackles the scarcity of data for training large language models (LLMs) as intelligent assistants in programming education by proposing QACP, a new annotated Chinese question-answering dataset for Python learners, and evaluates various LLMs to highlight their limitations in this domain.
In online learning platforms, particularly in rapidly growing computer programming courses, addressing the thousands of students' learning queries requires considerable human cost. The creation of intelligent assistant large language models (LLMs) tailored for programming education necessitates distinct data support. However, in real application scenarios, the data resources for training such LLMs are relatively scarce. Therefore, to address the data scarcity in intelligent educational systems for programming, this paper proposes a new Chinese question-and-answer dataset for Python learners. To ensure the authenticity and reliability of the sources of the questions, we collected questions from actual student questions and categorized them according to various dimensions such as the type of questions and the type of learners. This annotation principle is designed to enhance the effectiveness and quality of online programming education, providing a solid data foundation for developing the programming teaching assists (TA). Furthermore, we conducted comprehensive evaluations of various LLMs proficient in processing and generating Chinese content, highlighting the potential limitations of general LLMs as intelligent teaching assistants in computer programming courses.