LGMLJul 15, 2019

Technical Report: Partial Dependence through Stratification

arXiv:1907.06698v44 citations
Originality Highly original
AI Analysis

This work addresses the issue of unreliable model interpretation for business analysts and scientists, offering a nonparametric alternative to existing methods.

The paper tackles the problem of biased and model-dependent partial dependence curves in machine learning interpretation by introducing StratPD and CatStratPD methods that compute partial dependence directly from training data without fitting a model, demonstrating correct performance on synthetic and real datasets.

Partial dependence curves (FPD) introduced by Friedman, are an important model interpretation tool, but are often not accessible to business analysts and scientists who typically lack the skills to choose, tune, and assess machine learning models. It is also common for the same partial dependence algorithm on the same data to give meaningfully different curves for different models, which calls into question their precision. Expertise is required to distinguish between model artifacts and true relationships in the data. In this paper, we contribute methods for computing partial dependence curves, for both numerical (StratPD) and categorical explanatory variables (CatStratPD), that work directly from training data rather than predictions of a model. Our methods provide a direct estimate of partial dependence, and rely on approximating the partial derivative of an unknown regression function without first fitting a model and then approximating its partial derivative. We investigate settings where contemporary partial dependence methods---including FPD, ALE, and SHAP methods---give biased results. Furthermore, we demonstrate that our approach works correctly on synthetic and plausibly on real data sets. Our goal is not to argue that model-based techniques are not useful. Rather, we hope to open a new line of inquiry into nonparametric partial dependence.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes