Bayesian Stress Testing of Models in a Classification Hierarchy
This work addresses the need for reliable stress testing in real-life ML applications, such as financial fraud detection, but appears incremental as it builds on existing hierarchical modeling approaches.
The paper tackles the problem of stress testing machine learning models in hierarchical classification systems by proposing a Bayesian framework to model interactions among models, and demonstrates its application on a toy problem and a financial fraud detection dataset to increase confidence in performance before deployment.
Building a machine learning solution in real-life applications often involves the decomposition of the problem into multiple models of various complexity. This has advantages in terms of overall performance, better interpretability of the outcomes, and easier model maintenance. In this work we propose a Bayesian framework to model the interaction amongst models in such a hierarchy. We show that the framework can facilitate stress testing of the overall solution, giving more confidence in its expected performance prior to active deployment. Finally, we test the proposed framework on a toy problem and financial fraud detection dataset to demonstrate how it can be applied for any machine learning based solution, regardless of the underlying modelling required.