Concept-Based Explanations for Tabular Data
This work addresses the lack of concept-based explanations for tabular data, which is incremental as it adapts an existing method to a new domain.
The authors extended the TCAV concept attribution method to tabular data by defining concepts for this domain, showing validity on synthetic and real-world datasets with results matching human intuitions. They also proposed a TCAV-based fairness notion to quantify biased representations in DNN layers and empirically linked it to Demographic Parity.
The interpretability of machine learning models has been an essential area of research for the safe deployment of machine learning systems. One particular approach is to attribute model decisions to high-level concepts that humans can understand. However, such concept-based explainability for Deep Neural Networks (DNNs) has been studied mostly on image domain. In this paper, we extend TCAV, the concept attribution approach, to tabular learning, by providing an idea on how to define concepts over tabular data. On a synthetic dataset with ground-truth concept explanations and a real-world dataset, we show the validity of our method in generating interpretability results that match the human-level intuitions. On top of this, we propose a notion of fairness based on TCAV that quantifies what layer of DNN has learned representations that lead to biased predictions of the model. Also, we empirically demonstrate the relation of TCAV-based fairness to a group fairness notion, Demographic Parity.