CLApr 3, 2024

CSEPrompts: A Benchmark of Introductory Computer Science Prompts

arXiv:2404.02540v24 citationsh-index: 10ISMIS
AI Analysis

This addresses the problem of understanding potential misuse and impact of AI-generated content in CS education for schools and universities, though it is incremental as it builds on existing benchmarking efforts.

The authors tackled the need to assess the impact of large language models (LLMs) on computer science education by introducing CSEPrompts, a benchmark with hundreds of programming exercise prompts and multiple-choice questions from introductory courses, and they evaluated the performance of several LLMs on generating Python code and answering basic CS questions.

Recent advances in AI, machine learning, and NLP have led to the development of a new generation of Large Language Models (LLMs) that are trained on massive amounts of data and often have trillions of parameters. Commercial applications (e.g., ChatGPT) have made this technology available to the general public, thus making it possible to use LLMs to produce high-quality texts for academic and professional purposes. Schools and universities are aware of the increasing use of AI-generated content by students and they have been researching the impact of this new technology and its potential misuse. Educational programs in Computer Science (CS) and related fields are particularly affected because LLMs are also capable of generating programming code in various programming languages. To help understand the potential impact of publicly available LLMs in CS education, we introduce CSEPrompts, a framework with hundreds of programming exercise prompts and multiple-choice questions retrieved from introductory CS and programming courses. We also provide experimental results on CSEPrompts to evaluate the performance of several LLMs with respect to generating Python code and answering basic computer science and programming questions.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes