AI CLJun 4

Framing, Judging, Steering: An Assessable Competency Model for Teach-ing Students to Reason With Generative AI

arXiv:2606.0598350.9

Predicted impact top 80% in AI · last 90 daysOriginality Incremental advance

AI Analysis

For educators and researchers, this provides a diagnostic framework to assess and improve students' ability to reason with generative AI, moving beyond a single 'prompting' score.

The authors propose CoRe-3, a competency model that decomposes productive AI use into three assessable skills (Framing, Judging, Steering), and instantiate it in an open platform. Over simulated learners, the skills show convergent and discriminant validity, with each tracking its own manipulated competence while remaining flat in others.

Generative AI makes answers easy and understanding hard, and uncritical use invites cognitive offloading. Schools still measure unaided performance, yet the real task is to produce good work with AI: framing an ill-defined task, judging the output, and steering the model toward a better result. This ability is rarely assessed in its own right; where measured, it collapses into one "prompting" score that cannot diagnose why AI use succeeds or fails. We propose CoRe-3 (Co-Reasoning), a competency model factoring productive AI use into three assessable skills we abbreviate FJS: Framing (specifying an ill-defined task before invoking AI), Judging (evaluating output for errors and unstated assumptions), and Steering (iteratively redirecting the model). Its distinguishing claim is the separation of pre-generation Framing from post-generation Steering, with Judging as the gate between. We ground the skills in theory, state five testable propositions, and instantiate them in CoReasoningLab, an open platform that presents flawed AI output and scores them independently. Over simulated learners (generated and graded by different models), the skills dissociate: each tracks its own manipulated competence while staying flat in the others, and grades become correlated when one competence is shared across all three (convergent and discriminant validity), across grader backends from two providers. Human-rater agreement and outcomes are next; we release the instrument, data, and protocol.

View on arXiv PDF

Similar