On The Reasons Behind Decisions
This work addresses the need for interpretability in machine learning, specifically for Boolean classifiers, by providing a formal framework to understand decision reasons, which is incremental in building on existing compilation methods.
The paper tackles the problem of explaining decisions made by Boolean classifiers by defining theoretical notions like sufficient, necessary, and complete reasons, and presents efficient algorithms based on tractable Boolean circuits to compute these explanations, as illustrated in a case study.
Recent work has shown that some common machine learning classifiers can be compiled into Boolean circuits that have the same input-output behavior. We present a theory for unveiling the reasons behind the decisions made by Boolean classifiers and study some of its theoretical and practical implications. We define notions such as sufficient, necessary and complete reasons behind decisions, in addition to classifier and decision bias. We show how these notions can be used to evaluate counterfactual statements such as "a decision will stick even if ... because ... ." We present efficient algorithms for computing these notions, which are based on new advances on tractable Boolean circuits, and illustrate them using a case study.