Anisotropic oracle inequalities in noisy quantization
This work addresses the challenge of noisy data in quantization and clustering, which is incremental as it extends existing methods to handle errors in variables.
The paper tackles the problem of errors in variables in quantization, proving exact and non-exact oracle inequalities with fast rates for empirical minimization based on noisy samples, and applies this to k-means clustering with noisy data, achieving fast convergence rates under standard assumptions.
The effect of errors in variables in quantization is investigated. We prove general exact and non-exact oracle inequalities with fast rates for an empirical minimization based on a noisy sample $Z_i=X_i+ε_i,i=1,\ldots,n$, where $X_i$ are i.i.d. with density $f$ and $ε_i$ are i.i.d. with density $η$. These rates depend on the geometry of the density $f$ and the asymptotic behaviour of the characteristic function of $η$. This general study can be applied to the problem of $k$-means clustering with noisy data. For this purpose, we introduce a deconvolution $k$-means stochastic minimization which reaches fast rates of convergence under standard Pollard's regularity assumptions.