CYMay 24

Generative AI as a Design Variable: An Evidence-Centered Framework for Principled Governance in STEM Assessment

arXiv:2605.2483765.2
AI Analysis

For assessment designers and educators, this framework provides actionable guidance to preserve assessment validity while preparing students for AI-enabled workplaces.

This paper proposes an Evidence-Centered Design framework that treats GenAI as a design variable in STEM assessment, specifying when to restrict, scaffold, or require its use. Two task designs in an introductory physics course show that disciplinary AI interaction competencies are observable and scorable with defensible rubrics.

Generative Artificial Intelligence (GenAI) presents a governance challenge for STEM assessment. Unrestricted GenAI access enables task outsourcing that undermines the validity of traditional assessments; blanket prohibitions are difficult to enforce, may push use underground, and do little to prepare students for workplaces where GenAI-supported workflows are increasingly common. This paper addresses this dilemma by proposing a framework grounded in Evidence-Centered Design (ECD) that treats GenAI as a design variable within the assessment argument rather than an external threat to it. The framework analyzes how GenAI reshapes the student model, evidence model, and task model, and uses this analysis to articulate three principled governance stances. Restrict is warranted when GenAI would contaminate the inferential link between student work products and targeted unaided proficiency. Scaffold is warranted when bounded GenAI support can support peripheral demands without revealing the target construct, preserving inferential interpretability. Require is warranted when the target construct is disciplinary AI interaction competency and tasks can be designed to elicit process artifacts, including prompts, critiques, and revisions, that make student reasoning observable, scorable, and distinguishable from AI-generated output. This framework specifies when to restrict, scaffold, or require GenAI use in STEM assessment. We present two task designs deployed in an introductory physics course and demonstrate that disciplinary AI interaction competencies are observable in student response artifacts and can be scored using defensible rubrics grounded in student data and expert knowledge. By situating GenAI governance within validity arguments, the framework offers actionable guidance for preserving learning integrity while supporting authentic preparation for AI-enabled professional environments.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes