Stronger Coreset Bounds for Kernel Density Estimators via Chaining
This work addresses the computational efficiency of kernel density estimation for machine learning practitioners by providing stronger coreset bounds, though it is incremental as it builds on existing techniques like discrepancy and chaining.
The paper tackles the problem of constructing coresets for kernel density estimators by applying the discrepancy method and chaining to achieve improved bounds on coreset sizes, such as O(√d/ε √log log 1/ε) for Gaussian and Laplacian kernels under uniform boundedness, and O(1/ε √log log 1/ε) for Laplacian kernels with constant dimension d.
We apply the discrepancy method and a chaining approach to give improved bounds on the coreset complexity of a wide class of kernel functions. Our results give randomized polynomial time algorithms to produce coresets of size $O\big(\frac{\sqrt{d}}{\varepsilon}\sqrt{\log\log \frac{1}{\varepsilon}}\big)$ for the Gaussian and Laplacian kernels in the case that the data set is uniformly bounded, an improvement that was not possible with previous techniques. We also obtain coresets of size $O\big(\frac{1}{\varepsilon}\sqrt{\log\log \frac{1}{\varepsilon}}\big)$ for the Laplacian kernel for $d$ constant. Finally, we give the best known bounds of $O\big(\frac{\sqrt{d}}{\varepsilon}\sqrt{\log(2\max\{1,α\})}\big)$ on the coreset complexity of the exponential, Hellinger, and JS Kernels, where $1/α$ is the bandwidth parameter of the kernel.