Greedy Approximation Algorithms for Active Sequential Hypothesis Testing
This work addresses efficient hypothesis testing for applications like genomics-based cancer detection where the number of hypotheses or actions is massive, representing a novel method for a known bottleneck.
The paper tackles the problem of active sequential hypothesis testing (ASHT) by proposing greedy algorithms to minimize the number of actions needed to identify the true hypothesis with a target error probability, achieving approximation guarantees independent of the number of actions and logarithmic in the number of hypotheses. The algorithms outperform previous heuristic policies by large margins in evaluations using synthetic and real-world DNA mutation data.
In the problem of active sequential hypothesis testing (ASHT), a learner seeks to identify the true hypothesis from among a known set of hypotheses. The learner is given a set of actions and knows the random distribution of the outcome of any action under any true hypothesis. Given a target error $δ>0$, the goal is to sequentially select the fewest number of actions so as to identify the true hypothesis with probability at least $1 - δ$. Motivated by applications in which the number of hypotheses or actions is massive (e.g., genomics-based cancer detection), we propose efficient (greedy, in fact) algorithms and provide the first approximation guarantees for ASHT, under two types of adaptivity. Both of our guarantees are independent of the number of actions and logarithmic in the number of hypotheses. We numerically evaluate the performance of our algorithms using both synthetic and real-world DNA mutation data, demonstrating that our algorithms outperform previously proposed heuristic policies by large margins.