COMLNov 24, 2020

Random Sampling High Dimensional Model Representation Gaussian Process Regression (RS-HDMR-GPR) for representing multidimensional functions with machine-learned lower-dimensional terms allowing insight with a general method

arXiv:2012.02704v532 citations
AI Analysis

This method provides a general tool for representing multidimensional functions and gaining insight into variable importance, which could benefit researchers working with complex, high-dimensional data across various scientific and financial domains.

This paper introduces RS-HDMR-GPR, a Python implementation for representing multivariate functions using lower-dimensional terms. It aims to recover functional dependencies from sparse data, impute missing values, and prune HDMR terms, demonstrating its capabilities on synthetic functions, molecular potential energy surfaces, kinetic energy densities of materials, and financial market data.

We present a Python implementation for RS-HDMR-GPR (Random Sampling High Dimensional Model Representation Gaussian Process Regression). The method builds representations of multivariate functions with lower-dimensional terms, either as an expansion over orders of coupling or using terms of only a given dimensionality. This facilitates, in particular, recovering functional dependence from sparse data. The code also allows for imputation of missing values of the variables and for a significant pruning of the useful number of HDMR terms. The code can also be used for estimating relative importance of different combinations of input variables, thereby adding an element of insight to a general machine learning method. The capabilities of this regression tool are demonstrated on test cases involving synthetic analytic functions, the potential energy surface of the water molecule, kinetic energy densities of materials (crystalline magnesium, aluminum, and silicon), and financial market data.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes