Ute Heuer

h-index6

4papers

94citations

Novelty38%

AI Score30

Ranked #139,114 of 194,257 authors (top 72%)#1,572 in SE (top 52%)

4 Papers

4.9CLApr 24, 2023

AI, write an essay for me: A large-scale comparison of human-written versus ChatGPT-generated essays

Steffen Herbold, Annette Hautli-Janisz, Ute Heuer et al.

Background: Recently, ChatGPT and similar generative AI models have attracted hundreds of millions of users and become part of the public discourse. Many believe that such models will disrupt society and will result in a significant change in the education system and information generation in the future. So far, this belief is based on either colloquial evidence or benchmarks from the owners of the models -- both lack scientific rigour. Objective: Through a large-scale study comparing human-written versus ChatGPT-generated argumentative student essays, we systematically assess the quality of the AI-generated content. Methods: A large corpus of essays was rated using standard criteria by a large number of human experts (teachers). We augment the analysis with a consideration of the linguistic characteristics of the generated essays. Results: Our results demonstrate that ChatGPT generates essays that are rated higher for quality than human-written essays. The writing style of the AI models exhibits linguistic characteristics that are different from those of the human-written essays, e.g., it is characterized by fewer discourse and epistemic markers, but more nominalizations and greater lexical diversity. Conclusions: Our results clearly demonstrate that models like ChatGPT outperform humans in generating argumentative essays. Since the technology is readily available for anyone to use, educators must act immediately. We must re-invent homework and develop teaching concepts that utilize these AI models in the same way as math utilized the calculator: teach the general concepts first and then use AI tools to free up time for other learning objectives.

16.7SEFeb 15, 2021Code

LitterBox: A Linter for Scratch Programs

Gordon Fraser, Ute Heuer, Nina Körber et al.

Creating programs with block-based programming languages like Scratch is easy and fun. Block-based programs can nevertheless contain bugs, in particular when learners have misconceptions about programming. Even when they do not, Scratch code is often of low quality and contains code smells, further inhibiting understanding, reuse, and fun. To address this problem, in this paper we introduce LitterBox, a linter for Scratch programs. Given a program or its public project ID, LitterBox checks the program against patterns of known bugs and code smells. For each issue identified, LitterBox provides not only the location in the code, but also a helpful explanation of the underlying reason and possible misconceptions. Learners can access LitterBox through an easy to use web interface with visual information about the errors in the block-code, while for researchers LitterBox provides a general, open source, and extensible framework for static analysis of Scratch programs.

8.6SEAug 13, 2021Code

Code Perfumes: Reporting Good Code to Encourage Learners

Florian Obermüller, Lena Bloch, Luisa Greifenstein et al.

Block-based programming languages like Scratch enable children to be creative while learning to program. Even though the block-based approach simplifies the creation of programs, learning to program can nevertheless be challenging. Automated tools such as linters therefore support learners by providing feedback about potential bugs or code smells in their programs. Even when this feedback is elaborate and constructive, it still represents purely negative criticism and by construction ignores what learners have done correctly in their programs. In this paper we introduce an orthogonal approach to linting: We complement the criticism produced by a linter with positive feedback. We introduce the concept of code perfumes as the counterpart to code smells, indicating the correct application of programming practices considered to be good. By analysing not only what learners did wrong but also what they did right we hope to encourage learners, to provide teachers and students a better understanding of learners' progress, and to support the adoption of automated feedback tools. Using a catalogue of 25 code perfumes for Scratch, we empirically demonstrate that these represent frequent practices in Scratch, and we find that better programs indeed contain more code perfumes.

13.3SEMay 12, 2021Code

Guiding Next-Step Hint Generation Using Automated Tests

Florian Obermüller, Ute Heuer, Gordon Fraser

Learning basic programming with Scratch can be hard for novices and tutors alike: Students may not know how to advance when solving a task, teachers may face classrooms with many raised hands at a time, and the problem is exacerbated when novices are on their own in online or virtual lessons. It is therefore desirable to generate next-step hints automatically to provide individual feedback for students who are stuck, but current approaches rely on the availability of multiple hand-crafted or hand-selected sample solutions from which to draw valid hints, and have not been adapted for Scratch. Automated testing provides an opportunity to automatically select suitable candidate solutions for hint generation, even from a pool of student solutions using different solution approaches and varying in quality. In this paper we present Catnip, the first next-step hint generation approach for Scratch, which extends existing data-driven hint generation approaches with automated testing. Evaluation of Catnip on a dataset of student Scratch programs demonstrates that the generated hints point towards functional improvements, and the use of automated tests allows the hints to be better individualized for the chosen solution path.