Gaussianizing the Earth: Multidimensional Information Measures for Earth Data Analysis
This work addresses the problem of analyzing complex Earth data for researchers in environmental science and remote sensing, though it appears incremental as it applies an existing Gaussianization technique to new data domains.
The paper tackles the challenge of estimating information content in high-dimensional, heterogeneous Earth system data by applying multivariate Gaussianization for robust probability density estimation, enabling the calculation of information-theoretic measures like entropy and mutual information across various applications such as radar backscattering and remote sensing, with results confirming the method's validity.
Information theory is an excellent framework for analyzing Earth system data because it allows us to characterize uncertainty and redundancy, and is universally interpretable. However, accurately estimating information content is challenging because spatio-temporal data is high-dimensional, heterogeneous and has non-linear characteristics. In this paper, we apply multivariate Gaussianization for probability density estimation which is robust to dimensionality, comes with statistical guarantees, and is easy to apply. In addition, this methodology allows us to estimate information-theoretic measures to characterize multivariate densities: information, entropy, total correlation, and mutual information. We demonstrate how information theory measures can be applied in various Earth system data analysis problems. First we show how the method can be used to jointly Gaussianize radar backscattering intensities, synthesize hyperspectral data, and quantify of information content in aerial optical images. We also quantify the information content of several variables describing the soil-vegetation status in agro-ecosystems, and investigate the temporal scales that maximize their shared information under extreme events such as droughts. Finally, we measure the relative information content of space and time dimensions in remote sensing products and model simulations involving long records of key variables such as precipitation, sensible heat and evaporation. Results confirm the validity of the method, for which we anticipate a wide use and adoption. Code and demos of the implemented algorithms and information-theory measures are provided.