ISAAC: Auditing Causal Reasoning in Deep Models for Drug-Target Interaction
For researchers evaluating deep learning models in drug-target interaction, ISAAC provides a method to detect reliance on spurious features that accuracy metrics miss.
Deep learning models for drug-target interaction prediction often achieve strong benchmark performance without relying on mechanistically meaningful features. ISAAC, a post-hoc auditing framework, reveals approximately 25% relative differences in causal reasoning scores across models with comparable AUROC (within ~3%), highlighting limitations of standard accuracy-based evaluation.
Deep learning models for drug--target interaction (DTI) prediction often achieve strong benchmark performance without necessarily relying on mechanistically meaningful molecular features, a limitation that standard accuracy-based evaluation cannot detect. We introduce ISAAC (Intervention-based Structural Auditing Approach for Causal Reasoning), a post-hoc framework that evaluates prior-relative structural sensitivity by probing frozen models through matched mechanistic and spurious input-level interventions, independently of predictive accuracy. Applied to three sequence-based DTI architectures on the Davis benchmark, ISAAC reveals approximately 25\% relative differences in reasoning scores across models with comparable AUROC (within around 3\%), stable across training and intervention seeds and two distinct perturbation operators. These discrepancies, undetectable under conventional accuracy metrics, motivate the use of post-hoc structural auditing as a complement to standard performance evaluation in scientific machine learning for molecular modeling.