LG QUANT-PHAug 2, 2022

Fast Kernel Density Estimation with Density Matrices and Random Fourier Features

Joseph A. Gallego, Juan F. Osorio, Fabio A. González

arXiv:2208.01206v24.611 citationsh-index: 7Has Code

Originality Incremental advance

AI Analysis

This addresses the computational bottleneck of KDE for big data applications, though it is an incremental improvement as it builds on existing approximation techniques.

The paper tackles the inefficiency of kernel density estimation (KDE) for big data by proposing DMKDE, a method using density matrices and random Fourier features, which achieves competitive performance with state-of-the-art fast KDE approximations, particularly showing advantages on high-dimensional data.

Kernel density estimation (KDE) is one of the most widely used nonparametric density estimation methods. The fact that it is a memory-based method, i.e., it uses the entire training data set for prediction, makes it unsuitable for most current big data applications. Several strategies, such as tree-based or hashing-based estimators, have been proposed to improve the efficiency of the kernel density estimation method. The novel density kernel density estimation method (DMKDE) uses density matrices, a quantum mechanical formalism, and random Fourier features, an explicit kernel approximation, to produce density estimates. This method has its roots in the KDE and can be considered as an approximation method, without its memory-based restriction. In this paper, we systematically evaluate the novel DMKDE algorithm and compare it with other state-of-the-art fast procedures for approximating the kernel density estimation method on different synthetic data sets. Our experimental results show that DMKDE is on par with its competitors for computing density estimates and advantages are shown when performed on high-dimensional data. We have made all the code available as an open source software repository.

View on arXiv PDF Code

Similar