When stakes are high: balancing accuracy and transparency with Model-Agnostic Interpretable Data-driven suRRogates
This addresses the problem of balancing accuracy and transparency for stakeholders in regulated industries like banking and insurance, though it is incremental as it builds on existing surrogate and interpretability methods.
The authors tackled the need for transparent decision-making in regulated industries by developing a model-agnostic interpretable surrogate (maidrr) that extracts knowledge from black box models and fits a transparent GLM, demonstrating it closely approximates a gradient boosting machine and outperforms linear and tree surrogates on insurance claim frequency datasets.
Highly regulated industries, like banking and insurance, ask for transparent decision-making algorithms. At the same time, competitive markets are pushing for the use of complex black box models. We therefore present a procedure to develop a Model-Agnostic Interpretable Data-driven suRRogate (maidrr) suited for structured tabular data. Knowledge is extracted from a black box via partial dependence effects. These are used to perform smart feature engineering by grouping variable values. This results in a segmentation of the feature space with automatic variable selection. A transparent generalized linear model (GLM) is fit to the features in categorical format and their relevant interactions. We demonstrate our R package maidrr with a case study on general insurance claim frequency modeling for six publicly available datasets. Our maidrr GLM closely approximates a gradient boosting machine (GBM) black box and outperforms both a linear and tree surrogate as benchmarks.