Rethinking Cooperative Rationalization: Introspective Extraction and Complement Control
This work addresses the challenge of ensuring that rationalization mechanisms in machine learning models provide accurate and complete justifications, which is important for interpretability in domains like healthcare or finance, though it is incremental in improving existing cooperative frameworks.
The paper tackled the problem of compromised cooperative rationalization in predictive models by introducing an introspective model that incorporates outcome predictions into feature selection and controlling the rationale complement with an adversary, resulting in maintained high predictive accuracy and comprehensive rationales.
Selective rationalization has become a common mechanism to ensure that predictive models reveal how they use any available features. The selection may be soft or hard, and identifies a subset of input features relevant for prediction. The setup can be viewed as a co-operate game between the selector (aka rationale generator) and the predictor making use of only the selected features. The co-operative setting may, however, be compromised for two reasons. First, the generator typically has no direct access to the outcome it aims to justify, resulting in poor performance. Second, there's typically no control exerted on the information left outside the selection. We revise the overall co-operative framework to address these challenges. We introduce an introspective model which explicitly predicts and incorporates the outcome into the selection process. Moreover, we explicitly control the rationale complement via an adversary so as not to leave any useful information out of the selection. We show that the two complementary mechanisms maintain both high predictive accuracy and lead to comprehensive rationales.