AISep 30, 2025

Branching Out: Broadening AI Measurement and Evaluation with Measurement Trees

Craig Greenberg, Patrick Hall, Theodore Jensen, Kristen Greene, Razvan Amironesei

arXiv:2509.26632v1h-index: 8Has Code

Originality Incremental advance

AI Analysis

This work addresses the need for broader and more interpretable AI evaluation for researchers and practitioners, offering a foundational approach that is incremental in building on existing calls for expanded measurement.

This paper tackles the problem of limited transparency and integration in AI system evaluation by introducing measurement trees, a novel class of metrics that produce hierarchical directed graphs to combine heterogeneous evidence, and demonstrates their utility through a large-scale measurement exercise with open-source code.

This paper introduces \textit{measurement trees}, a novel class of metrics designed to combine various constructs into an interpretable multi-level representation of a measurand. Unlike conventional metrics that yield single values, vectors, surfaces, or categories, measurement trees produce a hierarchical directed graph in which each node summarizes its children through user-defined aggregation methods. In response to recent calls to expand the scope of AI system evaluation, measurement trees enhance metric transparency and facilitate the integration of heterogeneous evidence, including, e.g., agentic, business, energy-efficiency, sociotechnical, or security signals. We present definitions and examples, demonstrate practical utility through a large-scale measurement exercise, and provide accompanying open-source Python code. By operationalizing a transparent approach to measurement of complex constructs, this work offers a principled foundation for broader and more interpretable AI evaluation.

View on arXiv PDF

Similar