LG AINov 1, 2023

Learning impartial policies for sequential counterfactual explanations using Deep Reinforcement Learning

arXiv:2311.00523v12.0h-index: 6

Originality Incremental advance

AI Analysis

This work addresses bias in explainable AI for users needing fair and scalable counterfactual explanations, but it is incremental as it builds on existing reinforcement learning methods.

The paper tackles the problem of bias in sequential counterfactual explanation policies by proposing a method that uses classifier output probabilities to create a more informative reward, resulting in improved impartiality.

In the field of explainable Artificial Intelligence (XAI), sequential counterfactual (SCF) examples are often used to alter the decision of a trained classifier by implementing a sequence of modifications to the input instance. Although certain test-time algorithms aim to optimize for each new instance individually, recently Reinforcement Learning (RL) methods have been proposed that seek to learn policies for discovering SCFs, thereby enhancing scalability. As is typical in RL, the formulation of the RL problem, including the specification of state space, actions, and rewards, can often be ambiguous. In this work, we identify shortcomings in existing methods that can result in policies with undesired properties, such as a bias towards specific actions. We propose to use the output probabilities of the classifier to create a more informative reward, to mitigate this effect.

View on arXiv PDF

Similar