Interpretable Models in ANNs
This work addresses the problem of understanding complex neural network models for researchers and practitioners who need transparent and interpretable AI systems, especially in scientific domains where underlying physical laws might be simple.
This paper explores methods to interpret artificial neural networks and extract human-readable equations that describe the underlying model, particularly when the pattern can be described by relatively simple mathematical expressions, such as laws of physics. The goal is to move beyond black-box models towards more transparent and understandable representations.
Artificial neural networks are often very complex and too deep for a human to understand. As a result, they are usually referred to as black boxes. For a lot of real-world problems, the underlying pattern itself is very complicated, such that an analytic solution does not exist. However, in some cases, laws of physics, for example, the pattern can be described by relatively simple mathematical expressions. In that case, we want to get a readable equation rather than a black box. In this paper, we try to find a way to explain a network and extract a human-readable equation that describes the model.