Efficient Online Learning for Optimizing Value of Information: Theory and Application to Interactive Troubleshooting
This work addresses the problem of efficient decision-making under uncertainty for applications like interactive troubleshooting, though it is incremental as it builds on existing VoI concepts with new algorithmic improvements.
The paper tackles the optimal value of information problem by proposing an efficient online learning framework that overcomes the limitations of existing methods, such as poor scalability and reliance on known distributions, and demonstrates its effectiveness in a real-world interactive troubleshooting application with high-quality decisions at low cost.
We consider the optimal value of information (VoI) problem, where the goal is to sequentially select a set of tests with a minimal cost, so that one can efficiently make the best decision based on the observed outcomes. Existing algorithms are either heuristics with no guarantees, or scale poorly (with exponential run time in terms of the number of available tests). Moreover, these methods assume a known distribution over the test outcomes, which is often not the case in practice. We propose an efficient sampling-based online learning framework to address the above issues. First, assuming the distribution over hypotheses is known, we propose a dynamic hypothesis enumeration strategy, which allows efficient information gathering with strong theoretical guarantees. We show that with sufficient amount of samples, one can identify a near-optimal decision with high probability. Second, when the parameters of the hypotheses distribution are unknown, we propose an algorithm which learns the parameters progressively via posterior sampling in an online fashion. We further establish a rigorous bound on the expected regret. We demonstrate the effectiveness of our approach on a real-world interactive troubleshooting application and show that one can efficiently make high-quality decisions with low cost.