Huber-energy measure quantization
This work addresses measure quantization for applications in statistics and machine learning, presenting an incremental improvement with a new unbiased optimization procedure.
The authors tackled the problem of approximating probability measures with a sum of Dirac masses by minimizing a statistical distance based on a negative definite kernel, proposing the HEMQ algorithm which uses unbiased estimators for optimization. They tested HEMQ on datasets like Gaussian mixtures and MNIST, finding it robust and versatile, with results matching intuitive behavior for Huber-energy kernels.
We describe a measure quantization procedure i.e., an algorithm which finds the best approximation of a target probability law (and more generally signed finite variation measure) by a sum of $Q$ Dirac masses ($Q$ being the quantization parameter). The procedure is implemented by minimizing the statistical distance between the original measure and its quantized version; the distance is built from a negative definite kernel and, if necessary, can be computed on the fly and feed to a stochastic optimization algorithm (such as SGD, Adam, ...). We investigate theoretically the fundamental questions of existence of the optimal measure quantizer and identify what are the required kernel properties that guarantee suitable behavior. We propose two best linear unbiased (BLUE) estimators for the squared statistical distance and use them in an unbiased procedure, called HEMQ, to find the optimal quantization. We test HEMQ on several databases: multi-dimensional Gaussian mixtures, Wiener space cubature, Italian wine cultivars and the MNIST image database. The results indicate that the HEMQ algorithm is robust and versatile and, for the class of Huber-energy kernels, matches the expected intuitive behavior.