Density Estimation via Binless Multidimensional Integration
This addresses density estimation for high-dimensional data in fields like chemical physics, representing an incremental improvement over existing nonparametric methods.
The paper tackles the problem of nonparametric density estimation in high-dimensional spaces by introducing the Binless Multidimensional Thermodynamic Integration (BMTI) method, which reconstructs smooth density profiles without binning and outperforms traditional estimators on synthetic and chemical physics datasets.
We introduce the Binless Multidimensional Thermodynamic Integration (BMTI) method for nonparametric, robust, and data-efficient density estimation. BMTI estimates the logarithm of the density by initially computing log-density differences between neighbouring data points. Subsequently, such differences are integrated, weighted by their associated uncertainties, using a maximum-likelihood formulation. This procedure can be seen as an extension to a multidimensional setting of the thermodynamic integration, a technique developed in statistical physics. The method leverages the manifold hypothesis, estimating quantities within the intrinsic data manifold without defining an explicit coordinate map. It does not rely on any binning or space partitioning, but rather on the construction of a neighbourhood graph based on an adaptive bandwidth selection procedure. BMTI mitigates the limitations commonly associated with traditional nonparametric density estimators, effectively reconstructing smooth profiles even in high-dimensional embedding spaces. The method is tested on a variety of complex synthetic high-dimensional datasets, where it is shown to outperform traditional estimators, and is benchmarked on realistic datasets from the chemical physics literature.