AI GT LGSep 16, 2016

A Formal Solution to the Grain of Truth Problem

Jan Leike, Jessica Taylor, Benya Fallenstein

arXiv:1609.05058v111.318 citations

Originality Highly original

AI Analysis

This provides a formal solution to a foundational problem in multi-agent learning, enabling agents to learn and act optimally in unknown environments, though it is theoretical with computational approximations.

The paper tackles the grain of truth problem in multi-agent environments by constructing a class of policies that includes all computable policies and Bayes-optimal policies for every lower semicomputable prior, and shows that agents based on Thompson sampling converge to play ε-Nash equilibria in arbitrary unknown computable environments.

A Bayesian agent acting in a multi-agent environment learns to predict the other agents' policies if its prior assigns positive probability to them (in other words, its prior contains a \emph{grain of truth}). Finding a reasonably large class of policies that contains the Bayes-optimal policies with respect to this class is known as the \emph{grain of truth problem}. Only small classes are known to have a grain of truth and the literature contains several related impossibility results. In this paper we present a formal and general solution to the full grain of truth problem: we construct a class of policies that contains all computable policies as well as Bayes-optimal policies for every lower semicomputable prior over the class. When the environment is unknown, Bayes-optimal agents may fail to act optimally even asymptotically. However, agents based on Thompson sampling converge to play ε-Nash equilibria in arbitrary unknown computable multi-agent environments. While these results are purely theoretical, we show that they can be computationally approximated arbitrarily closely.

View on arXiv PDF

Similar