Learning from Humans as an I-POMDP
This work addresses the challenge of interactive learning from humans for AI agents, presenting a novel theoretical framework that is incremental in extending existing POMDP methods to multi-agent settings.
The paper tackles the problem of an agent learning interactively from a human teacher by formulating it as an Interactive Partially Observable Markov Decision Process (I-POMDP), which provides a principled framework for action selection and belief updates while supporting common teacher signals.
The interactive partially observable Markov decision process (I-POMDP) is a recently developed framework which extends the POMDP to the multi-agent setting by including agent models in the state space. This paper argues for formulating the problem of an agent learning interactively from a human teacher as an I-POMDP, where the agent \emph{programming} to be learned is captured by random variables in the agent's state space, all \emph{signals} from the human teacher are treated as observed random variables, and the human teacher, modeled as a distinct agent, is explicitly represented in the agent's state space. The main benefits of this approach are: i. a principled action selection mechanism, ii. a principled belief update mechanism, iii. support for the most common teacher \emph{signals}, and iv. the anticipated production of complex beneficial interactions. The proposed formulation, its benefits, and several open questions are presented.