LGMLJul 20, 2020

Towards Ground Truth Explainability on Tabular Data

arXiv:2007.10532v110 citations
Originality Synthesis-oriented
AI Analysis

This provides a tool for data scientists to test and understand explainability methods on tabular data, but it is incremental as it adapts an existing approach from image data to a new domain.

The paper tackles the lack of ground truth for explanations in post hoc explainability on tabular data by proposing a method using copulas to create synthetic datasets with controlled statistical properties, enabling users to build intuition through three demonstrated use cases.

In data science, there is a long history of using synthetic data for method development, feature selection and feature engineering. Our current interest in synthetic data comes from recent work in explainability. Today's datasets are typically larger and more complex - requiring less interpretable models. In the setting of \textit{post hoc} explainability, there is no ground truth for explanations. Inspired by recent work in explaining image classifiers that does provide ground truth, we propose a similar solution for tabular data. Using copulas, a concise specification of the desired statistical properties of a dataset, users can build intuition around explainability using controlled data sets and experimentation. The current capabilities are demonstrated on three use cases: one dimensional logistic regression, impact of correlation from informative features, impact of correlation from redundant variables.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes