AISep 21, 2023

Inferring Capabilities from Task Performance with Bayesian Triangulation

Cambridge
arXiv:2309.11975v213 citationsh-index: 10
Originality Incremental advance
AI Analysis

This work addresses the need for richer evaluation methods for general AI systems, offering a capability-oriented approach that is incremental in applying Bayesian inference to model assessment.

The paper tackles the problem of characterizing machine learning models by inferring their cognitive profiles from diverse experimental data, introducing measurement layouts to model task-feature interactions and using Bayesian methods to infer profiles for agents in two scenarios, including 68 contestants in the AnimalAI Olympics and 30 synthetic agents in an object permanence battery.

As machine learning models become more general, we need to characterise them in richer, more meaningful ways. We describe a method to infer the cognitive profile of a system from diverse experimental data. To do so, we introduce measurement layouts that model how task-instance features interact with system capabilities to affect performance. These features must be triangulated in complex ways to be able to infer capabilities from non-populational data -- a challenge for traditional psychometric and inferential tools. Using the Bayesian probabilistic programming library PyMC, we infer different cognitive profiles for agents in two scenarios: 68 actual contestants in the AnimalAI Olympics and 30 synthetic agents for O-PIAAGETS, an object permanence battery. We showcase the potential for capability-oriented evaluation.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes