LGMLFeb 11, 2020

Lifting Interpretability-Performance Trade-off via Automated Feature Engineering

arXiv:2002.04267v11 citations
AI Analysis

This work addresses the problem of balancing accuracy and interpretability in machine learning for practitioners, though it is incremental as it builds on existing feature engineering and surrogate model techniques.

The paper tackles the trade-off between model interpretability and performance by proposing a method that uses black-box surrogate models to engineer features for creating interpretable glass-box models, achieving improved performance for linear models on tabular datasets and challenging the notion that complex models always outperform linear ones.

Complex black-box predictive models may have high performance, but lack of interpretability causes problems like lack of trust, lack of stability, sensitivity to concept drift. On the other hand, achieving satisfactory accuracy of interpretable models require more time-consuming work related to feature engineering. Can we train interpretable and accurate models, without timeless feature engineering? We propose a method that uses elastic black-boxes as surrogate models to create a simpler, less opaque, yet still accurate and interpretable glass-box models. New models are created on newly engineered features extracted with the help of a surrogate model. We supply the analysis by a large-scale benchmark on several tabular data sets from the OpenML database. There are two results 1) extracting information from complex models may improve the performance of linear models, 2) questioning a common myth that complex machine learning models outperform linear models.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes