Using Causal Discovery to Track Information Flow in Spatio-Temporal Data - A Testbed and Experimental Results Using Advection-Diffusion Simulations
This work addresses a validation gap for geoscientists using causal methods to study dynamical processes like advection and diffusion, though it is incremental as it builds on existing algorithms with simulated data.
The paper tackles the lack of ground truth for evaluating causal discovery algorithms in geoscience by developing a testbed using advection-diffusion simulations on a 2D grid, and applies these algorithms to assess their performance and interpret graph results, making the datasets available as a benchmark.
Causal discovery algorithms based on probabilistic graphical models have emerged in geoscience applications for the identification and visualization of dynamical processes. The key idea is to learn the structure of a graphical model from observed spatio-temporal data, which indicates information flow, thus pathways of interactions, in the observed physical system. Studying those pathways allows geoscientists to learn subtle details about the underlying dynamical mechanisms governing our planet. Initial studies using this approach on real-world atmospheric data have shown great potential for scientific discovery. However, in these initial studies no ground truth was available, so that the resulting graphs have been evaluated only by whether a domain expert thinks they seemed physically plausible. This paper seeks to fill this gap. We develop a testbed that emulates two dynamical processes dominant in many geoscience applications, namely advection and diffusion, in a 2D grid. Then we apply the causal discovery based information tracking algorithms to the simulation data to study how well the algorithms work for different scenarios and to gain a better understanding of the physical meaning of the graph results, in particular of instantaneous connections. We make all data sets used in this study available to the community as a benchmark. Keywords: Information flow, graphical model, structure learning, causal discovery, geoscience.