AI GTJul 28, 2015

Belief and Truth in Hypothesised Behaviours

Stefano V. Albrecht, Jacob W. Crandall, Subramanian Ramamoorthy

arXiv:1507.07688v321.983 citationsh-index: 24

Originality Incremental advance

AI Analysis

This work addresses task completion and payoff maximization in multi-agent AI systems, providing insights into belief correctness and optimal strategies, but it is incremental as it builds on existing game theory concepts.

The paper tackles the problem of how an agent can plan actions when other agents' behaviors are unknown, by hypothesizing types and updating beliefs based on evidence, showing that prior beliefs significantly impact long-term payoff maximization and can be computed automatically for consistent performance.

There is a long history in game theory on the topic of Bayesian or "rational" learning, in which each player maintains beliefs over a set of alternative behaviours, or types, for the other players. This idea has gained increasing interest in the artificial intelligence (AI) community, where it is used as a method to control a single agent in a system composed of multiple agents with unknown behaviours. The idea is to hypothesise a set of types, each specifying a possible behaviour for the other agents, and to plan our own actions with respect to those types which we believe are most likely, given the observed actions of the agents. The game theory literature studies this idea primarily in the context of equilibrium attainment. In contrast, many AI applications have a focus on task completion and payoff maximisation. With this perspective in mind, we identify and address a spectrum of questions pertaining to belief and truth in hypothesised types. We formulate three basic ways to incorporate evidence into posterior beliefs and show when the resulting beliefs are correct, and when they may fail to be correct. Moreover, we demonstrate that prior beliefs can have a significant impact on our ability to maximise payoffs in the long-term, and that they can be computed automatically with consistent performance effects. Furthermore, we analyse the conditions under which we are able complete our task optimally, despite inaccuracies in the hypothesised types. Finally, we show how the correctness of hypothesised types can be ascertained during the interaction via an automated statistical analysis.

View on arXiv PDF

Similar