An Agentic Evaluation Architecture for Historical Bias Detection in Educational Textbooks
This addresses the issue of scalable bias auditing in educational materials for governance, though it is incremental as it builds on existing agent-based methods.
The paper tackled the problem of detecting historical bias in educational textbooks by proposing an agentic evaluation architecture, which reduced over-penalization with 83.3% of excerpts classified as acceptable versus a baseline severity of 5.4/7 and was preferred in 64.8% of human evaluations.
History textbooks often contain implicit biases, nationalist framing, and selective omissions that are difficult to audit at scale. We propose an agentic evaluation architecture comprising a multimodal screening agent, a heterogeneous jury of five evaluative agents, and a meta-agent for verdict synthesis and human escalation. A central contribution is a Source Attribution Protocol that distinguishes textbook narrative from quoted historical sources, preventing the misattribution that causes systematic false positives in single-model evaluators. In an empirical study on Romanian upper-secondary history textbooks, 83.3\% of 270 screened excerpts were classified as pedagogically acceptable (mean severity 2.9/7), versus 5.4/7 under a zero-shot baseline, demonstrating that agentic deliberation mitigates over-penalization. In a blind human evaluation (18 evaluators, 54 comparisons), the Independent Deliberation configuration was preferred in 64.8\% of cases over both a heuristic variant and the zero-shot baseline. At approximately \$2 per textbook, these results position agentic evaluation architectures as economically viable decision-support tools for educational governance.