AIROMay 21, 2018

Learning What Information to Give in Partially Observed Domains

arXiv:1805.08263v43 citations
AI Analysis

This addresses the challenge of human-robot collaboration in unseen environments, though it appears incremental as it builds on existing belief MDP frameworks.

The paper tackles the problem of an autonomous agent planning actions and transmitting declarative information to a human teammate in partially observed environments, by modeling human preferences information-theoretically and providing a tractable algorithm for approximate solution and online learning, validated in simulated search-and-recover domains.

In many robotic applications, an autonomous agent must act within and explore a partially observed environment that is unobserved by its human teammate. We consider such a setting in which the agent can, while acting, transmit declarative information to the human that helps them understand aspects of this unseen environment. In this work, we address the algorithmic question of how the agent should plan out what actions to take and what information to transmit. Naturally, one would expect the human to have preferences, which we model information-theoretically by scoring transmitted information based on the change it induces in weighted entropy of the human's belief state. We formulate this setting as a belief MDP and give a tractable algorithm for solving it approximately. Then, we give an algorithm that allows the agent to learn the human's preferences online, through exploration. We validate our approach experimentally in simulated discrete and continuous partially observed search-and-recover domains. Visit http://tinyurl.com/chitnis-corl-18 for a supplementary video.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes