Partially Interpretable Estimators (PIE): Black-Box-Refined Interpretable Machine Learning
This addresses the need for more accurate yet understandable models in domains requiring interpretability, such as healthcare or finance, though it is incremental as it refines existing hybrid approaches.
The paper tackles the problem of balancing predictive performance and interpretability in machine learning by proposing Partially Interpretable Estimators (PIE), which combine an interpretable model for main feature contributions with a black-box model for feature interactions, resulting in competitive performance against black-box models and outperforming interpretable baselines.
We propose Partially Interpretable Estimators (PIE) which attribute a prediction to individual features via an interpretable model, while a (possibly) small part of the PIE prediction is attributed to the interaction of features via a black-box model, with the goal to boost the predictive performance while maintaining interpretability. As such, the interpretable model captures the main contributions of features, and the black-box model attempts to complement the interpretable piece by capturing the "nuances" of feature interactions as a refinement. We design an iterative training algorithm to jointly train the two types of models. Experimental results show that PIE is highly competitive to black-box models while outperforming interpretable baselines. In addition, the understandability of PIE is comparable to simple linear models as validated via a human evaluation.