Goal-Directed Planning by Reinforcement Learning and Active Inference
This work addresses a foundational problem in decision-making for AI and robotics, though it appears incremental as it combines existing methods like reinforcement learning and active inference.
The paper tackles the problem of distinguishing goal-directed from habitual behavior by proposing a computational framework that integrates reinforcement learning and active inference in a neural network model, demonstrating its effectiveness in a sensorimotor navigation task with camera observations and continuous actions.
What is the difference between goal-directed and habitual behavior? We propose a novel computational framework of decision making with Bayesian inference, in which everything is integrated as an entire neural network model. The model learns to predict environmental state transitions by self-exploration and generating motor actions by sampling stochastic internal states ${z}$. Habitual behavior, which is obtained from the prior distribution of ${z}$, is acquired by reinforcement learning. Goal-directed behavior is determined from the posterior distribution of ${z}$ by planning, using active inference which optimizes the past, current and future ${z}$ by minimizing the variational free energy for the desired future observation constrained by the observed sensory sequence. We demonstrate the effectiveness of the proposed framework by experiments in a sensorimotor navigation task with camera observations and continuous motor actions.