AI LG NEOct 21, 2020

Meta-trained agents implement Bayes-optimal agents

Vladimir Mikulik, Grégoire Delétang, Tom McGrath, Tim Genewein, Miljan Martic, Shane Legg, Pedro A. Ortega

arXiv:2010.11223v124.852 citationsh-index: 26

Originality Highly original

AI Analysis

This work provides empirical validation for a theoretical claim, potentially enabling approximation of Bayes-optimal agents in complex task distributions where tractable models are lacking.

The paper investigates whether memory-based meta-learning produces agents that behave Bayes-optimally, showing empirically that meta-learned and Bayes-optimal agents share similar computational structures and that Bayes-optimal agents are fixed points of meta-learning dynamics.

Memory-based meta-learning is a powerful technique to build agents that adapt fast to any task within a target distribution. A previous theoretical study has argued that this remarkable performance is because the meta-training protocol incentivises agents to behave Bayes-optimally. We empirically investigate this claim on a number of prediction and bandit tasks. Inspired by ideas from theoretical computer science, we show that meta-learned and Bayes-optimal agents not only behave alike, but they even share a similar computational structure, in the sense that one agent system can approximately simulate the other. Furthermore, we show that Bayes-optimal agents are fixed points of the meta-learning dynamics. Our results suggest that memory-based meta-learning might serve as a general technique for numerically approximating Bayes-optimal agents - that is, even for task distributions for which we currently don't possess tractable models.

View on arXiv PDF

Similar