CYAIAug 13, 2025

STREAM (ChemBio): A Standard for Transparently Reporting Evaluations in AI Model Reports

arXiv:2508.09853v29 citationsh-index: 13
Originality Synthesis-oriented
AI Analysis

This addresses the need for clear and trustworthy reporting in AI safety evaluations, especially for high-risk applications, though it is incremental as it builds on existing transparency efforts.

The paper tackles the problem of insufficient transparency in AI model evaluations for dangerous capabilities, particularly in chemical and biological domains, by proposing STREAM, a standard for reporting evaluation results, developed with expert consultation and including practical examples and a template.

Evaluations of dangerous AI capabilities are important for managing catastrophic risks. Public transparency into these evaluations - including what they test, how they are conducted, and how their results inform decisions - is crucial for building trust in AI development. We propose STREAM (A Standard for Transparently Reporting Evaluations in AI Model Reports), a standard to improve how model reports disclose evaluation results, initially focusing on chemical and biological (ChemBio) benchmarks. Developed in consultation with 23 experts across government, civil society, academia, and frontier AI companies, this standard is designed to (1) be a practical resource to help AI developers present evaluation results more clearly, and (2) help third parties identify whether model reports provide sufficient detail to assess the rigor of the ChemBio evaluations. We concretely demonstrate our proposed best practices with "gold standard" examples, and also provide a three-page reporting template to enable AI developers to implement our recommendations more easily.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes