Adrien Saumard

ST
3papers
66citations
Novelty53%
AI Score24

3 Papers

MEFeb 10, 2020
K-bMOM: a robust Lloyd-type clustering algorithm based on bootstrap Median-of-Means

Camille Brunet-Saumard, Edouard Genetay, Adrien Saumard

We propose a new clustering algorithm that is robust to the presence of outliers in the dataset. We perform Lloyd-type iterations with robust estimates of the centroids. More precisely, we build on the idea of median-of-means statistics to estimate the centroids, but allow for replacement while constructing the blocks. We call this methodology the bootstrap median-of-means (bMOM) and prove that if enough blocks are generated through the bootstrap sampling, then it has a better breakdown point for mean estimation than the classical median-of-means (MOM), where the blocks form a partition of the dataset. From a clustering perspective, bMOM enables to take many blocks of a desired size, thus avoiding possible disappearance of clusters in some blocks, a pitfall that can occur for the partition-based generation of blocks of the classical median-of-means. Experiments on simulated datasets show that the proposed approach, called K-bMOM, performs better than existing robust K-means based methods. Guidelines are provided for tuning the hyper-parameters K-bMOM in practice. It is also recommended to the practitionner to use such a robust approach to initialize their clustering algorithm. Finally, considering a simplified and theoretical version of our estimator, we prove its robustness to adversarial contamination by deriving robust rates of convergence for the K-means distorsion. To our knowledge, it is the first result of this kind for the K-means distorsion.

STMar 5, 2019
Local differential privacy: Elbow effect in optimal density estimation and adaptation over Besov ellipsoids

Cristina Butucea, Amandine Dubois, Martin Kroll et al.

We address the problem of non-parametric density estimation under the additional constraint that only privatised data are allowed to be published and available for inference. For this purpose, we adopt a recent generalisation of classical minimax theory to the framework of local $α$-differential privacy and provide a lower bound on the rate of convergence over Besov spaces $B^s_{pq}$ under mean integrated $\mathbb L^r$-risk. This lower bound is deteriorated compared to the standard setup without privacy, and reveals a twofold elbow effect. In order to fulfil the privacy requirement, we suggest adding suitably scaled Laplace noise to empirical wavelet coefficients. Upper bounds within (at most) a logarithmic factor are derived under the assumption that $α$ stays bounded as $n$ increases: A linear but non-adaptive wavelet estimator is shown to attain the lower bound whenever $p \geq r$ but provides a slower rate of convergence otherwise. An adaptive non-linear wavelet estimator with appropriately chosen smoothing parameters and thresholding is shown to attain the lower bound within a logarithmic factor for all cases.

STFeb 16, 2017
A concentration inequality for the excess risk in least-squares regression with random design and heteroscedastic noise

Adrien Saumard

We prove a new and general concentration inequality for the excess risk in least-squares regression with random design and heteroscedastic noise. No specific structure is required on the model, except the existence of a suitable function that controls the local suprema of the empirical process. So far, only the case of linear contrast estimation was tackled in the literature with this level of generality on the model. We solve here the case of a quadratic contrast, by separating the behavior of a linearized empirical process and the empirical process driven by the squares of functions of models.