MLJul 29, 2022
SHAP for additively modeled features in a boosted trees modelMichael Mayer
An important technique to explore a black-box machine learning (ML) model is called SHAP (SHapley Additive exPlanation). SHAP values decompose predictions into contributions of the features in a fair way. We will show that for a boosted trees model with some or all features being additively modeled, the SHAP dependence plot of such a feature corresponds to its partial dependence plot up to a vertical shift. We illustrate the result with XGBoost.
MLAug 18, 2025
Shapley Values: Paired-Sampling ApproximationsMichael Mayer, Mario V. Wüthrich
Originally introduced in cooperative game theory, Shapley values have become a very popular tool to explain machine learning predictions. Based on Shapley's fairness axioms, every input (feature component) gets a credit how it contributes to an output (prediction). These credits are then used to explain the prediction. The only limitation in computing the Shapley values (credits) for many different predictions is of computational nature. There are two popular sampling approximations, sampling KernelSHAP and sampling PermutationSHAP. Our first novel contributions are asymptotic normality results for these sampling approximations. Next, we show that the paired-sampling approaches provide exact results in case of interactions being of maximal order two. Furthermore, the paired-sampling PermutationSHAP possesses the additive recovery property, whereas its kernel counterpart does not.
MLFeb 25, 2022
Model Comparison and Calibration Assessment: User Guide for Consistent Scoring Functions in Machine Learning and Actuarial PracticeTobias Fissler, Christian Lorentzen, Michael Mayer
One of the main tasks of actuaries and data scientists is to build good predictive models for certain phenomena such as the claim size or the number of claims in insurance. These models ideally exploit given feature information to enhance the accuracy of prediction. This user guide revisits and clarifies statistical techniques to assess the calibration or adequacy of a model on the one hand, and to compare and rank different models on the other hand. In doing so, it emphasises the importance of specifying the prediction target functional at hand a priori (e.g. the mean or a quantile) and of choosing the scoring function in model comparison in line with this target functional. Guidance for the practical choice of the scoring function is provided. Striving to bridge the gap between science and daily practice in application, it focuses mainly on the pedagogical presentation of existing results and of best practice. The results are accompanied and illustrated by two real data case studies on workers' compensation and customer churn.
HCDec 10, 2015
Graph-theoretic autofillMichael Mayer, Dominic van der Zypen
Imagine a website that asks the user to fill in a web form and -- based on the input values -- derives a relevant figure, for instance an expected salary, a medical diagnosis or the market value of a house. How to deal with missing input values at run-time? Besides using fixed defaults, a more sophisticated approach is to use predefined dependencies (logical or correlational) between different fields to autofill missing values in an iterative way. Directed loopless graphs (in which cycles are allowed) are the ideal mathematical model to formalize these dependencies. We present two new graph-theoretic approaches to filling missing values at run-time.