Utilizing Mutations to Evaluate Interpretability of Neural Networks on Genomic Data
This work addresses the challenge of selecting faithful interpretability methods for genomic sequence tasks, providing quantitative guidance for researchers in computational biology and genomics.
The paper tackled the problem of evaluating the fidelity of attribution methods for interpreting neural networks on genomic data by proposing a computational approach using point mutations, finding that Layerwise Relevance Propagation (LRP) was the most appropriate method for translation initiation and identified key biological features.
Even though deep neural networks (DNNs) achieve state-of-the-art results for a number of problems involving genomic data, getting DNNs to explain their decision-making process has been a major challenge due to their black-box nature. One way to get DNNs to explain their reasoning for prediction is via attribution methods which are assumed to highlight the parts of the input that contribute to the prediction the most. Given the existence of numerous attribution methods and a lack of quantitative results on the fidelity of those methods, selection of an attribution method for sequence-based tasks has been mostly done qualitatively. In this work, we take a step towards identifying the most faithful attribution method by proposing a computational approach that utilizes point mutations. Providing quantitative results on seven popular attribution methods, we find Layerwise Relevance Propagation (LRP) to be the most appropriate one for translation initiation, with LRP identifying two important biological features for translation: the integrity of Kozak sequence as well as the detrimental effects of premature stop codons.