AIFeb 25, 2025

PyEvalAI: AI-assisted evaluation of Jupyter Notebooks for immediate personalized feedback

arXiv:2502.18425v1h-index: 5Has Code
Originality Incremental advance
AI Analysis

This addresses the workload for tutors and delayed feedback for students in STEM education, though it appears incremental as it builds on existing automated grading approaches with privacy and control enhancements.

The authors tackled the problem of slow, labor-intensive grading of STEM student assignments by developing PyEvalAI, an AI-assisted system that automatically evaluates Jupyter notebooks using unit tests and a local language model, which in a case study improved feedback speed and grading efficiency for a university numerics course.

Grading student assignments in STEM courses is a laborious and repetitive task for tutors, often requiring a week to assess an entire class. For students, this delay of feedback prevents iterating on incorrect solutions, hampers learning, and increases stress when exercise scores determine admission to the final exam. Recent advances in AI-assisted education, such as automated grading and tutoring systems, aim to address these challenges by providing immediate feedback and reducing grading workload. However, existing solutions often fall short due to privacy concerns, reliance on proprietary closed-source models, lack of support for combining Markdown, LaTeX and Python code, or excluding course tutors from the grading process. To overcome these limitations, we introduce PyEvalAI, an AI-assisted evaluation system, which automatically scores Jupyter notebooks using a combination of unit tests and a locally hosted language model to preserve privacy. Our approach is free, open-source, and ensures tutors maintain full control over the grading process. A case study demonstrates its effectiveness in improving feedback speed and grading efficiency for exercises in a university-level course on numerics.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes