Brian Magerko

CYNov 28, 2025

Rethinking AI Evaluation in Education: The TEACH-AI Framework and Benchmark for Generative AI Assistants

Shi Ding, Brian Magerko

As generative artificial intelligence (AI) continues to transform education, most existing AI evaluations rely primarily on technical performance metrics such as accuracy or task efficiency while overlooking human identity, learner agency, contextual learning processes, and ethical considerations. In this paper, we present TEACH-AI (Trustworthy and Effective AI Classroom Heuristics), a domain-independent, pedagogically grounded, and stakeholder-aligned framework with measurable indicators and a practical toolkit for guiding the design, development, and evaluation of generative AI systems in educational contexts. Built on an extensive literature review and synthesis, the ten-component assessment framework and toolkit checklist provide a foundation for scalable, value-aligned AI evaluation in education. TEACH-AI rethinks "evaluation" through sociotechnical, educational, theoretical, and applied lenses, engaging designers, developers, researchers, and policymakers across AI and education. Our work invites the community to reconsider what constructs "effective" AI in education and to design model evaluation approaches that promote co-creation, inclusivity, and long-term human, social, and educational impact.

AIJun 5, 2021

Empirically Evaluating Creative Arc Negotiation for Improvisational Decision-making

Mikhail Jacob, Brian Magerko

Action selection from many options with few constraints is crucial for improvisation and co-creativity. Our previous work proposed creative arc negotiation to solve this problem, i.e., selecting actions to follow an author-defined `creative arc' or trajectory over estimates of novelty, unexpectedness, and quality for potential actions. The CARNIVAL agent architecture demonstrated this approach for playing the Props game from improv theatre in the Robot Improv Circus installation. This article evaluates the creative arc negotiation experience with CARNIVAL through two crowdsourced observer studies and one improviser laboratory study. The studies focus on subjects' ability to identify creative arcs in performance and their preference for creative arc negotiation compared to a random selection baseline. Our results show empirically that observers successfully identified creative arcs in performances. Both groups also preferred creative arc negotiation in agent creativity and logical coherence, while observers enjoyed it more too.

Brian Magerko

2 Papers