AILGFeb 3, 2022

ExPoSe: Combining State-Based Exploration with Gradient-Based Online Search

arXiv:2202.01461v43 citationsHas Code
AI Analysis

This work addresses a key limitation in online search algorithms for decision-making problems, offering an incremental improvement by integrating exploration into gradient-based methods.

The paper tackles the problem of online tree-based search algorithms failing to share information across similar states and lacking explicit exploration mechanisms, proposing ExPoSe which combines gradient-based updates with exploration and shows consistent outperformance over other algorithms in domains like Atari games, Sokoban, and Hamiltonian cycle search.

Online tree-based search algorithms iteratively simulate trajectories and update action-values for a set of states stored in a tree structure. It works reasonably well in practice but fails to effectively utilise the information gathered from similar states. Depending upon the smoothness of the action-value function, one approach to overcoming this issue is through online learning, where information is interpolated among similar states; Policy Gradient Search provides a practical algorithm to achieve this. However, Policy Gradient Search lacks an explicit exploration mechanism, which is a key feature of tree-based online search algorithms. In this paper, we propose an efficient and effective online search algorithm called Exploratory Policy Gradient Search (ExPoSe), which leverages information sharing among states by updating the search policy parameters directly, while incorporating a well-defined exploration mechanism during the online search process. We evaluate ExPoSe on a range of decision-making problems, including Atari games, Sokoban, and Hamiltonian cycle search in sparse graphs. The results demonstrate that ExPoSe consistently outperforms other popular online search algorithms across all domains. The ExPoSe source code is available at \textit{\url{https://github.com/dixantmittal/ExPoSe}}.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes