Counterfactual Explanation with Multi-Agent Reinforcement Learning for Drug Target Prediction
This work addresses the need for explainable AI in drug discovery to enhance trust and extract biological insights, though it is incremental as it extends existing counterfactual methods to a specific domain with dual inputs.
The paper tackled the problem of generating counterfactual explanations for drug-target affinity (DTA) models, which are black-box and lack interpretability, by proposing a multi-agent reinforcement learning framework (MACDA) that optimizes both drug and target inputs simultaneously, resulting in more parsimonious explanations with no loss in validity on the Davis dataset.
Motivation: Many high-performance DTA models have been proposed, but they are mostly black-box and thus lack human interpretability. Explainable AI (XAI) can make DTA models more trustworthy, and can also enable scientists to distill biological knowledge from the models. Counterfactual explanation is one popular approach to explaining the behaviour of a deep neural network, which works by systematically answering the question "How would the model output change if the inputs were changed in this way?". Most counterfactual explanation methods only operate on single input data. It remains an open problem how to extend counterfactual-based XAI methods to DTA models, which have two inputs, one for drug and one for target, that also happen to be discrete in nature. Methods: We propose a multi-agent reinforcement learning framework, Multi-Agent Counterfactual Drug target binding Affinity (MACDA), to generate counterfactual explanations for the drug-protein complex. Our proposed framework provides human-interpretable counterfactual instances while optimizing both the input drug and target for counterfactual generation at the same time. Results: We benchmark the proposed MACDA framework using the Davis dataset and find that our framework produces more parsimonious explanations with no loss in explanation validity, as measured by encoding similarity and QED. We then present a case study involving ABL1 and Nilotinib to demonstrate how MACDA can explain the behaviour of a DTA model in the underlying substructure interaction between inputs in its prediction, revealing mechanisms that align with prior domain knowledge.