AIMay 4

An explainable hypothesis-driven approach to Drug-Induced Liver Injury with HADES

Maciej Wisniewski, Bartosz Topolski, Pawel Dabrowski-Tumanski, Dariusz Plewczynski, Tomasz Jetka

arXiv:2605.0266947.9

AI Analysis

For researchers in predictive toxicology and drug development, this work provides a new benchmark and an explainable approach for DILI prediction, though the performance on hypothesis generation is modest.

The authors argue that Drug-Induced Liver Injury (DILI) prediction should be framed as an explainable hypothesis-generation problem rather than binary classification. They introduce the DILER Benchmark with mechanistic hypotheses and present HADES, an agentic system that achieves a ROC-AUC of 0.68 on the Test Set and 0.59 on the Post-2021 Set, outperforming DILI-Predictor (0.63 and 0.50), and establishes a baseline for mechanistic hypothesis generation with a Hypothesis Alignment Fuzzy Jaccard Index of 0.16.

Drug-induced liver injury (DILI) remains a leading cause of late-stage clinical trial attrition. However, existing computational predictors primarily rely on binary classification, a framing that limits generalization and yields no mechanistic insight to guide translational decisions. We argue that DILI prediction is better posed as an explainable hypothesis-generation problem. To support this shift, we introduce the DILER Benchmark, a dataset that extends beyond binary labels by augmenting a curated set of molecules with mechanistic hepatotoxicity hypotheses derived from biomedical literature. We further present HADES, an agentic system designed to generate transparent and auditable reasoning traces. By combining molecular-level predictions, metabolite decomposition, structural understanding, and toxicity pathway evidence, HADES mechanistically assesses DILI risk. Evaluated on the DILER Benchmark, HADES outperforms existing models in binary classification, achieving a ROC-AUC of 0.68 on the Test Set and 0.59 on the challenging Post-2021 Set, compared with 0.63 and 0.50 for DILI-Predictor, respectively. More importantly, we establish a baseline for mechanistic hypothesis generation, where HADES achieves a Hypothesis Alignment Fuzzy Jaccard Index of 0.16. This result underscores the inherent complexity of the task while highlighting the need for advanced explainable approaches in predictive toxicology.

View on arXiv PDF

Similar