Measuring the Driving Forces of Predictive Performance: Application to Credit Scoring
This work addresses the need for banking supervisors and model validators to monitor and understand credit scoring model performance, though it is incremental as it adapts Shapley values for performance decomposition.
The paper tackles the problem of identifying key drivers of predictive performance in credit scoring models by introducing the XPER methodology, which decomposes performance metrics like AUC into feature contributions, and demonstrates on a car loan dataset that a few features explain a large part of model performance, with potential to address heterogeneity and improve results.
As they play an increasingly important role in determining access to credit, credit scoring models are under growing scrutiny from banking supervisors and internal model validators. These authorities need to monitor the model performance and identify its key drivers. To facilitate this, we introduce the XPER methodology to decompose a performance metric (e.g., AUC, $R^2$) into specific contributions associated with the various features of a forecasting model. XPER is theoretically grounded on Shapley values and is both model-agnostic and performance metric-agnostic. Furthermore, it can be implemented either at the model level or at the individual level. Using a novel dataset of car loans, we decompose the AUC of a machine-learning model trained to forecast the default probability of loan applicants. We show that a small number of features can explain a surprisingly large part of the model performance. Notably, the features that contribute the most to the predictive performance of the model may not be the ones that contribute the most to individual forecasts (SHAP). Finally, we show how XPER can be used to deal with heterogeneity issues and improve performance.