On the Complexity of Exploration in Goal-Driven Navigation
This work addresses the challenge of intelligent exploration for AI agents in navigation tasks, but it is incremental as it builds on existing hierarchical methods like h-DQN.
The paper tackled the problem of understanding how hierarchical policy design affects exploration capabilities in goal-driven navigation, by introducing EscapeRoom environments and showing that analytically estimated hitting time in goal dependency graphs is an informative metric of environment complexity, with results indicating that solving environments beyond certain complexity levels requires hierarchical approaches.
Building agents that can explore their environments intelligently is a challenging open problem. In this paper, we make a step towards understanding how a hierarchical design of the agent's policy can affect its exploration capabilities. First, we design EscapeRoom environments, where the agent must figure out how to navigate to the exit by accomplishing a number of intermediate tasks (\emph{subgoals}), such as finding keys or opening doors. Our environments are procedurally generated and vary in complexity, which can be controlled by the number of subgoals and relationships between them. Next, we propose to measure the complexity of each environment by constructing dependency graphs between the goals and analytically computing \emph{hitting times} of a random walk in the graph. We empirically evaluate Proximal Policy Optimization (PPO) with sparse and shaped rewards, a variation of policy sketches, and a hierarchical version of PPO (called HiPPO) akin to h-DQN. We show that analytically estimated \emph{hitting time} in goal dependency graphs is an informative metric of the environment complexity. We conjecture that the result should hold for environments other than navigation. Finally, we show that solving environments beyond certain level of complexity requires hierarchical approaches.