Encoding Causal Macrovariables
This work addresses the need for automated macrovariable detection in scientific disciplines using high-dimensional data, offering a method that can be adapted to different goals, but it appears incremental as it builds on existing causal modeling concepts.
The paper tackles the problem of automatically detecting suitable macrovariables for coarse-grained causal models from high-dimensional observational data, introducing a novel algorithmic approach inspired by information bottlenecks. It demonstrates robust detection of ground-truth variables in synthetic data and identifies known El Nino variations in a real climate dataset.
In many scientific disciplines, coarse-grained causal models are used to explain and predict the dynamics of more fine-grained systems. Naturally, such models require appropriate macrovariables. Automated procedures to detect suitable variables would be useful to leverage increasingly available high-dimensional observational datasets. This work introduces a novel algorithmic approach that is inspired by a new characterisation of causal macrovariables as information bottlenecks between microstates. Its general form can be adapted to address individual needs of different scientific goals. After a further transformation step, the causal relationships between learned variables can be investigated through additive noise models. Experiments on both simulated data and on a real climate dataset are reported. In a synthetic dataset, the algorithm robustly detects the ground-truth variables and correctly infers the causal relationships between them. In a real climate dataset, the algorithm robustly detects two variables that correspond to the two known variations of the El Nino phenomenon.