CY AIAug 13, 2025

STREAM (ChemBio): A Standard for Transparently Reporting Evaluations in AI Model Reports

Tegan McCaslin, Jide Alaga, Samira Nedungadi, Seth Donoughe, Tom Reed, Rishi Bommasani, Chris Painter, Luca Righetti

arXiv:2508.09853v29 citationsh-index: 13

Originality Synthesis-oriented

AI Analysis

This addresses the need for clear and trustworthy reporting in AI safety evaluations, especially for high-risk applications, though it is incremental as it builds on existing transparency efforts.

The paper tackles the problem of insufficient transparency in AI model evaluations for dangerous capabilities, particularly in chemical and biological domains, by proposing STREAM, a standard for reporting evaluation results, developed with expert consultation and including practical examples and a template.

Evaluations of dangerous AI capabilities are important for managing catastrophic risks. Public transparency into these evaluations - including what they test, how they are conducted, and how their results inform decisions - is crucial for building trust in AI development. We propose STREAM (A Standard for Transparently Reporting Evaluations in AI Model Reports), a standard to improve how model reports disclose evaluation results, initially focusing on chemical and biological (ChemBio) benchmarks. Developed in consultation with 23 experts across government, civil society, academia, and frontier AI companies, this standard is designed to (1) be a practical resource to help AI developers present evaluation results more clearly, and (2) help third parties identify whether model reports provide sufficient detail to assess the rigor of the ChemBio evaluations. We concretely demonstrate our proposed best practices with "gold standard" examples, and also provide a three-page reporting template to enable AI developers to implement our recommendations more easily.

View on arXiv PDF

Similar