CYAICLETDec 9, 2024

Chatbots im Schulunterricht: Wir testen das Fobizz-Tool zur automatischen Bewertung von Hausaufgaben

arXiv:2412.06651v54 citationsh-index: 1
Originality Synthesis-oriented
AI Analysis

It addresses the problem of unreliable AI tools in education for teachers and students, highlighting incremental testing of an existing tool on new data.

This study evaluated the AI Grading Assistant by Fobizz for grading student assignments and found significant shortcomings, such as random grades and feedback that did not improve with suggestions, with the highest ratings only achievable using ChatGPT-generated texts.

This study examines the AI-powered grading tool "AI Grading Assistant" by the German company Fobizz, designed to support teachers in evaluating and providing feedback on student assignments. Against the societal backdrop of an overburdened education system and rising expectations for artificial intelligence as a solution to these challenges, the investigation evaluates the tool's functional suitability through two test series. The results reveal significant shortcomings: The tool's numerical grades and qualitative feedback are often random and do not improve even when its suggestions are incorporated. The highest ratings are achievable only with texts generated by ChatGPT. False claims and nonsensical submissions frequently go undetected, while the implementation of some grading criteria is unreliable and opaque. Since these deficiencies stem from the inherent limitations of large language models (LLMs), fundamental improvements to this or similar tools are not immediately foreseeable. The study critiques the broader trend of adopting AI as a quick fix for systemic problems in education, concluding that Fobizz's marketing of the tool as an objective and time-saving solution is misleading and irresponsible. Finally, the study calls for systematic evaluation and subject-specific pedagogical scrutiny of the use of AI tools in educational contexts.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes