Explaining Deep Neural Networks and Beyond: A Review of Methods and Applications
This work addresses the growing demand for Explainable AI to enhance understanding of nonlinear machine learning models, particularly deep neural networks, for researchers and practitioners, but it is incremental as it synthesizes existing knowledge rather than introducing new methods.
The paper provides a comprehensive review of methods and applications for explaining deep neural networks and other machine learning models, focusing on post-hoc explanations, theoretical foundations, comparative evaluations, and best practices for integrating interpretability into standard workflows.
With the broader and highly successful usage of machine learning in industry and the sciences, there has been a growing demand for Explainable AI. Interpretability and explanation methods for gaining a better understanding about the problem solving abilities and strategies of nonlinear Machine Learning, in particular, deep neural networks, are therefore receiving increased attention. In this work we aim to (1) provide a timely overview of this active emerging field, with a focus on 'post-hoc' explanations, and explain its theoretical foundations, (2) put interpretability algorithms to a test both from a theory and comparative evaluation perspective using extensive simulations, (3) outline best practice aspects i.e. how to best include interpretation methods into the standard usage of machine learning and (4) demonstrate successful usage of explainable AI in a representative selection of application scenarios. Finally, we discuss challenges and possible future directions of this exciting foundational field of machine learning.