On Sequential Fault-Intolerant Process Planning
This addresses a reward structure critical for applications like drug discovery and security, though it is incremental as it builds on existing multi-armed bandit methods.
The paper tackles the Sequential Fault-Intolerant Process Planning (SFIPP) problem, where success requires all stages to succeed, and develops provably tight online algorithms for deterministic and probabilistic action outcomes, with empirical results showing specialized algorithms outperform general ones.
We propose and study a planning problem we call Sequential Fault-Intolerant Process Planning (SFIPP). SFIPP captures a reward structure common in many sequential multi-stage decision problems where the planning is deemed successful only if all stages succeed. Such reward structures are different from classic additive reward structures and arise in important applications such as drug/material discovery, security, and quality-critical product design. We design provably tight online algorithms for settings in which we need to pick between different actions with unknown success chances at each stage. We do so both for the foundational case in which the behavior of actions is deterministic, and the case of probabilistic action outcomes, where we effectively balance exploration for learning and exploitation for planning through the usage of multi-armed bandit algorithms. In our empirical evaluations, we demonstrate that the specialized algorithms we develop, which leverage additional information about the structure of the SFIPP instance, outperform our more general algorithm.