AISep 21, 2023

Inferring Capabilities from Task Performance with Bayesian Triangulation

John Burden, Konstantinos Voudouris, Ryan Burnell, Danaja Rutar, Lucy Cheke, José Hernández-Orallo

Cambridge

arXiv:2309.11975v213.913 citationsh-index: 10

Originality Incremental advance

AI Analysis

This work addresses the need for richer evaluation methods for general AI systems, offering a capability-oriented approach that is incremental in applying Bayesian inference to model assessment.

The paper tackles the problem of characterizing machine learning models by inferring their cognitive profiles from diverse experimental data, introducing measurement layouts to model task-feature interactions and using Bayesian methods to infer profiles for agents in two scenarios, including 68 contestants in the AnimalAI Olympics and 30 synthetic agents in an object permanence battery.

As machine learning models become more general, we need to characterise them in richer, more meaningful ways. We describe a method to infer the cognitive profile of a system from diverse experimental data. To do so, we introduce measurement layouts that model how task-instance features interact with system capabilities to affect performance. These features must be triangulated in complex ways to be able to infer capabilities from non-populational data -- a challenge for traditional psychometric and inferential tools. Using the Bayesian probabilistic programming library PyMC, we infer different cognitive profiles for agents in two scenarios: 68 actual contestants in the AnimalAI Olympics and 30 synthetic agents for O-PIAAGETS, an object permanence battery. We showcase the potential for capability-oriented evaluation.

View on arXiv PDF

Similar