Quantifying Model Complexity via Functional Decomposition for Better Post-Hoc Interpretability
This work addresses the challenge of improving interpretability for complex machine learning models, which is crucial for users in fields requiring transparent AI, but it is incremental as it builds on existing functional decomposition and optimization methods.
The paper tackled the problem of misleading and verbose results from post-hoc interpretation methods when applied to overly complex models, particularly concerning feature interactions, by proposing model-agnostic complexity measures based on functional decomposition. The result showed that models minimizing these measures lead to more reliable and compact interpretations, with a demonstration in a multi-objective optimization approach that simultaneously reduces loss and complexity.
Post-hoc model-agnostic interpretation methods such as partial dependence plots can be employed to interpret complex machine learning models. While these interpretation methods can be applied regardless of model complexity, they can produce misleading and verbose results if the model is too complex, especially w.r.t. feature interactions. To quantify the complexity of arbitrary machine learning models, we propose model-agnostic complexity measures based on functional decomposition: number of features used, interaction strength and main effect complexity. We show that post-hoc interpretation of models that minimize the three measures is more reliable and compact. Furthermore, we demonstrate the application of these measures in a multi-objective optimization approach which simultaneously minimizes loss and complexity.