SEApr 3, 2025
Level Up Peer Review in Education: Investigating genAI-driven Gamification system and its influence on Peer Feedback EffectivenessRafal Wlodarski, Leonardo da Silva Sousa, Allison Connell Pensky
In software engineering (SE), the ability to review code and critique designs is essential for professional practice. However, these skills are rarely emphasized in formal education, and peer feedback quality and engagement can vary significantly among students. This paper introduces Socratique, a gamified peer-assessment platform integrated with Generative AI (GenAI) assistance, designed to develop students' peer-review skills in a functional programming course. By incorporating game elements, Socratique aims to motivate students to provide more feedback, while the GenAI assistant offers real-time support in crafting high quality, constructive comments. To evaluate the impact of this approach, we conducted a randomized controlled experiment with master's students comparing a treatment group with a gamified, GenAI-driven setup against a control group with minimal gamification. Results show that students in the treatment group provided significantly more voluntary feedback, with higher scores on clarity, relevance, and specificity - all key aspects of effective code and design reviews. This study provides evidence for the effectiveness of combining gamification and AI to improve peer review processes, with implications for fostering review-related competencies in software engineering curricula.
2.1SEMar 31
From Domain Understanding to Design Readiness: a playbook for GenAI-supported learning in Software EngineeringRafal Wlodarski
Software engineering courses often require rapid upskilling in supporting knowledge areas such as domain understanding and modeling methods. We report an experience from a two-week milestone in a master's course where 29 students used a customized ChatGPT (GPT-3.5) tutor grounded in a curated course knowledge base to learn cryptocurrency-finance basics and Domain-Driven Design (DDD). We logged all interactions and evaluated a 34.5% random sample of prompt-answer pairs (60/~174) with a five-dimension rubric (accuracy, relevance, pedagogical value, cognitive load, supportiveness), and we collected pre/post self-efficacy. Responses were consistently accurate and relevant in this setting: accuracy averaged 98.9% with no factual errors and only 2/60 minor inaccuracies, and relevance averaged 92.2%. Pedagogical value was high (89.4%) with generally appropriate cognitive load (82.78%), but supportiveness was low (37.78%). Students reported large pre-post self-efficacy gains for genAI-assisted domain learning and DDD application. From these observations we distill seventeen concrete teaching practices spanning prompt/configuration and course/workflow design (e.g., setting expected granularity, constraining verbosity, curating guardrail examples, adding small credit with a simple quality rubric). Within this single-course context, results suggest that genAI-supported learning can complement instruction in domain understanding and modeling tasks, while leaving room to improve tone and follow-up structure.