DISCRET: Synthesizing Faithful Explanations For Treatment Effect Estimation
This addresses the need for trustworthy AI in critical settings like healthcare by providing faithful explanations without compromising accuracy, though it is incremental as it builds on existing self-interpretable methods.
The paper tackles the challenge of designing accurate and faithful AI models for individual treatment effect estimation by proposing DISCRET, a self-interpretable framework that synthesizes rule-based explanations, achieving accuracy comparable to black-box models while outperforming other self-interpretable models.
Designing faithful yet accurate AI models is challenging, particularly in the field of individual treatment effect estimation (ITE). ITE prediction models deployed in critical settings such as healthcare should ideally be (i) accurate, and (ii) provide faithful explanations. However, current solutions are inadequate: state-of-the-art black-box models do not supply explanations, post-hoc explainers for black-box models lack faithfulness guarantees, and self-interpretable models greatly compromise accuracy. To address these issues, we propose DISCRET, a self-interpretable ITE framework that synthesizes faithful, rule-based explanations for each sample. A key insight behind DISCRET is that explanations can serve dually as database queries to identify similar subgroups of samples. We provide a novel RL algorithm to efficiently synthesize these explanations from a large search space. We evaluate DISCRET on diverse tasks involving tabular, image, and text data. DISCRET outperforms the best self-interpretable models and has accuracy comparable to the best black-box models while providing faithful explanations. DISCRET is available at https://github.com/wuyinjun-1993/DISCRET-ICML2024.