Training Characteristic Functions with Reinforcement Learning: XAI-methods play Connect Four
This work addresses a key challenge in XAI for researchers by providing a more reliable and manipulable framework for comparing attribution methods, though it is incremental as it builds on existing game-theoretic concepts.
The paper tackles the problem of evaluating saliency attribution methods in Explainable AI by training characteristic functions as neural networks to play Connect Four, enabling direct comparison of methods through gameplay and eliminating the need for off-manifold evaluation.
One of the goals of Explainable AI (XAI) is to determine which input components were relevant for a classifier decision. This is commonly know as saliency attribution. Characteristic functions (from cooperative game theory) are able to evaluate partial inputs and form the basis for theoretically "fair" attribution methods like Shapley values. Given only a standard classifier function, it is unclear how partial input should be realised. Instead, most XAI-methods for black-box classifiers like neural networks consider counterfactual inputs that generally lie off-manifold. This makes them hard to evaluate and easy to manipulate. We propose a setup to directly train characteristic functions in the form of neural networks to play simple two-player games. We apply this to the game of Connect Four by randomly hiding colour information from our agents during training. This has three advantages for comparing XAI-methods: It alleviates the ambiguity about how to realise partial input, makes off-manifold evaluation unnecessary and allows us to compare the methods by letting them play against each other.