Towards Learning and Explaining Indirect Causal Effects in Neural Networks
This work addresses the need for more accurate causal effect estimation in neural networks, which is important for interpretability in fields like healthcare or finance, though it is incremental by extending existing causal frameworks.
The paper tackles the problem of learning and explaining indirect causal effects in neural networks, which previous methods ignored by assuming input independence, and demonstrates that their ante-hoc method better approximates ground truth effects on synthetic and real-world datasets.
Recently, there has been a growing interest in learning and explaining causal effects within Neural Network (NN) models. By virtue of NN architectures, previous approaches consider only direct and total causal effects assuming independence among input variables. We view an NN as a structural causal model (SCM) and extend our focus to include indirect causal effects by introducing feedforward connections among input neurons. We propose an ante-hoc method that captures and maintains direct, indirect, and total causal effects during NN model training. We also propose an algorithm for quantifying learned causal effects in an NN model and efficient approximation strategies for quantifying causal effects in high-dimensional data. Extensive experiments conducted on synthetic and real-world datasets demonstrate that the causal effects learned by our ante-hoc method better approximate the ground truth effects compared to existing methods.