Progress Constraints for Reinforcement Learning in Behavior Trees
This addresses the challenge of safe and efficient RL in structured decision-making for robotics or autonomous systems, but it is incremental as it builds on existing BT-RL integration approaches.
The paper tackles the problem of naive integration of Behavior Trees (BTs) and Reinforcement Learning (RL), which can lead to controllers counteracting each other and degrading performance, by proposing progress constraints that use feasibility estimators to constrain actions based on BT convergence results. Empirical evaluations in 2D and warehouse environments show improved performance, sample efficiency, and constraint satisfaction compared to prior methods.
Behavior Trees (BTs) provide a structured and reactive framework for decision-making, commonly used to switch between sub-controllers based on environmental conditions. Reinforcement Learning (RL), on the other hand, can learn near-optimal controllers but sometimes struggles with sparse rewards, safe exploration, and long-horizon credit assignment. Combining BTs with RL has the potential for mutual benefit: a BT design encodes structured domain knowledge that can simplify RL training, while RL enables automatic learning of the controllers within BTs. However, naive integration of BTs and RL can lead to some controllers counteracting other controllers, possibly undoing previously achieved subgoals, thereby degrading the overall performance. To address this, we propose progress constraints, a novel mechanism where feasibility estimators constrain the allowed action set based on theoretical BT convergence results. Empirical evaluations in a 2D proof-of-concept and a high-fidelity warehouse environment demonstrate improved performance, sample efficiency, and constraint satisfaction, compared to prior methods of BT-RL integration.