Decomposing neural networks as mappings of correlation functions
This work offers an incremental theoretical framework for explaining information processing in trained neural networks, primarily benefiting researchers in interpretable AI.
The researchers tackled the challenge of understanding how deep neural networks process information by characterizing them as mappings between probability distributions that transform correlation functions. They demonstrated on XOR and MNIST tasks that internal layers primarily use second-order correlations while input layers extract higher-order correlations, providing a quantitative explanation for classification.
Understanding the functional principles of information processing in deep neural networks continues to be a challenge, in particular for networks with trained and thus non-random weights. To address this issue, we study the mapping between probability distributions implemented by a deep feed-forward network. We characterize this mapping as an iterated transformation of distributions, where the non-linearity in each layer transfers information between different orders of correlation functions. This allows us to identify essential statistics in the data, as well as different information representations that can be used by neural networks. Applied to an XOR task and to MNIST, we show that correlations up to second order predominantly capture the information processing in the internal layers, while the input layer also extracts higher-order correlations from the data. This analysis provides a quantitative and explainable perspective on classification.