Random Sampling High Dimensional Model Representation Gaussian Process Regression (RS-HDMR-GPR) for representing multidimensional functions with machine-learned lower-dimensional terms allowing insight with a general method
This method provides a general tool for representing multidimensional functions and gaining insight into variable importance, which could benefit researchers working with complex, high-dimensional data across various scientific and financial domains.
This paper introduces RS-HDMR-GPR, a Python implementation for representing multivariate functions using lower-dimensional terms. It aims to recover functional dependencies from sparse data, impute missing values, and prune HDMR terms, demonstrating its capabilities on synthetic functions, molecular potential energy surfaces, kinetic energy densities of materials, and financial market data.
We present a Python implementation for RS-HDMR-GPR (Random Sampling High Dimensional Model Representation Gaussian Process Regression). The method builds representations of multivariate functions with lower-dimensional terms, either as an expansion over orders of coupling or using terms of only a given dimensionality. This facilitates, in particular, recovering functional dependence from sparse data. The code also allows for imputation of missing values of the variables and for a significant pruning of the useful number of HDMR terms. The code can also be used for estimating relative importance of different combinations of input variables, thereby adding an element of insight to a general machine learning method. The capabilities of this regression tool are demonstrated on test cases involving synthetic analytic functions, the potential energy surface of the water molecule, kinetic energy densities of materials (crystalline magnesium, aluminum, and silicon), and financial market data.