Global Sequential Testing for Multi-Stream Auditing
This work addresses the need for efficient auditing in risk-sensitive areas, offering incremental improvements over existing methods for sequential hypothesis testing.
The paper tackles the problem of quickly detecting unusual behavior in multi-stream machine learning systems by developing new sequential tests that improve expected stopping times, achieving a bound of O((1/k)ln(1/α)) under dense alternatives compared to the standard O(ln(k/α)).
Across many risk-sensitive areas, it is critical to continuously audit the performance of machine learning systems and detect any unusual behavior quickly. This can be modeled as a sequential hypothesis testing problem with $k$ incoming streams of data and a global null hypothesis that asserts that the system is working as expected across all $k$ streams. The standard global test employs a Bonferroni correction and has an expected stopping time bound of $O\left(\ln\frac{k}α\right)$ when $k$ is large and the significance level of the test, $α$, is small. In this work, we construct new sequential tests by using ideas of merging test martingales with different trade-offs in expected stopping times under different, sparse or dense alternative hypotheses. We further derive a new, balanced test that achieves an improved expected stopping time bound that matches Bonferroni's in the sparse setting but that naturally results in $O\left(\frac{1}{k}\ln\frac{1}α\right)$ under a dense alternative. We empirically demonstrate the effectiveness of our proposed tests on synthetic and real-world data.