An Agentic Evaluation Architecture for Historical Bias Detection in Educational Textbooks

arXiv:2604.0788326.9

AI Analysis

This addresses the issue of scalable bias auditing in educational materials for governance, though it is incremental as it builds on existing agent-based methods.

The paper tackled the problem of detecting historical bias in educational textbooks by proposing an agentic evaluation architecture, which reduced over-penalization with 83.3% of excerpts classified as acceptable versus a baseline severity of 5.4/7 and was preferred in 64.8% of human evaluations.

History textbooks often contain implicit biases, nationalist framing, and selective omissions that are difficult to audit at scale. We propose an agentic evaluation architecture comprising a multimodal screening agent, a heterogeneous jury of five evaluative agents, and a meta-agent for verdict synthesis and human escalation. A central contribution is a Source Attribution Protocol that distinguishes textbook narrative from quoted historical sources, preventing the misattribution that causes systematic false positives in single-model evaluators. In an empirical study on Romanian upper-secondary history textbooks, 83.3\% of 270 screened excerpts were classified as pedagogically acceptable (mean severity 2.9/7), versus 5.4/7 under a zero-shot baseline, demonstrating that agentic deliberation mitigates over-penalization. In a blind human evaluation (18 evaluators, 54 comparisons), the Independent Deliberation configuration was preferred in 64.8\% of cases over both a heuristic variant and the zero-shot baseline. At approximately \$2 per textbook, these results position agentic evaluation architectures as economically viable decision-support tools for educational governance.

View on arXiv PDF

Similar