Mónica Ribero

LG
h-index45
8papers
115citations
Novelty55%
AI Score43

8 Papers

LGAug 15, 2022
Easy Differentially Private Linear Regression

Kareem Amin, Matthew Joseph, Mónica Ribero et al.

Linear regression is a fundamental tool for statistical analysis. This has motivated the development of linear regression methods that also satisfy differential privacy and thus guarantee that the learned model reveals little about any one data point used to construct it. However, existing differentially private solutions assume that the end user can easily specify good data bounds and hyperparameters. Both present significant practical obstacles. In this paper, we study an algorithm which uses the exponential mechanism to select a model with high Tukey depth from a collection of non-private regression models. Given $n$ samples of $d$-dimensional data used to train $m$ models, we construct an efficient analogue using an approximate Tukey depth that runs in time $O(d^2n + dm\log(m))$. We find that this algorithm obtains strong empirical performance in the data-rich setting with no data bounds or hyperparameter selection required.

LGJul 7, 2024
Privacy of the last iterate in cyclically-sampled DP-SGD on nonconvex composite losses

Weiwei Kong, Mónica Ribero

Differentially-private stochastic gradient descent (DP-SGD) is a family of iterative machine learning training algorithms that privatize gradients to generate a sequence of differentially-private (DP) model parameters. It is also the standard tool used to train DP models in practice, even though most users are only interested in protecting the privacy of the final model. Tight DP accounting for the last iterate would minimize the amount of noise required while maintaining the same privacy guarantee and potentially increasing model utility. However, last-iterate accounting is challenging, and existing works require strong assumptions not satisfied by most implementations. These include assuming (i) the global sensitivity constant is known - to avoid gradient clipping; (ii) the loss function is Lipschitz or convex; and (iii) input batches are sampled randomly. In this work, we forego any unrealistic assumptions and provide privacy bounds for the most commonly used variant of DP-SGD, in which data is traversed cyclically, gradients are clipped, and only the last model is released. More specifically, we establish new Renyi differential privacy (RDP) upper bounds for the last iterate under realistic assumptions of small stepsize and Lipschitz smoothness of the loss function. Our general bounds also recover the special-case convex bounds when the weak-convexity parameter of the objective function approaches zero and no clipping is performed. The approach itself leverages optimal transport techniques for last iterate bounds, which is a nontrivial task when the data is traversed cyclically and the loss function is nonconvex.

LGOct 4, 2023
Differentially Private Optimization for Non-Decomposable Objective Functions

Weiwei Kong, Andrés Muñoz Medina, Mónica Ribero

Unsupervised pre-training is a common step in developing computer vision models and large language models. In this setting, the absence of labels requires the use of similarity-based loss functions, such as contrastive loss, that favor minimizing the distance between similar inputs and maximizing the distance between distinct inputs. As privacy concerns mount, training these models using differential privacy has become more important. However, due to how inputs are generated for these losses, one of their undesirable properties is that their $L_2$ sensitivity grows with the batch size. This property is particularly disadvantageous for differentially private training methods, such as DP-SGD. To overcome this issue, we develop a new DP-SGD variant for similarity based loss functions -- in particular, the commonly-used contrastive loss -- that manipulates gradients of the objective function in a novel way to obtain a sensitivity of the summed gradient that is $O(1)$ for batch size $n$. We test our DP-SGD variant on some CIFAR-10 pre-training and CIFAR-100 finetuning tasks and show that, in both tasks, our method's performance comes close to that of a non-private model and generally outperforms DP-SGD applied directly to the contrastive loss.

MLJan 27
Regularized $f$-Divergence Kernel Tests

Mónica Ribero, Antonin Schrab, Arthur Gretton

We propose a framework to construct practical kernel-based two-sample tests from the family of $f$-divergences. The test statistic is computed from the witness function of a regularized variational representation of the divergence, which we estimate using kernel methods. The proposed test is adaptive over hyperparameters such as the kernel bandwidth and the regularization parameter. We provide theoretical guarantees for statistical test power across our family of $f$-divergence estimates. While our test covers a variety of $f$-divergences, we bring particular focus to the Hockey-Stick divergence, motivated by its applications to differential privacy auditing and machine unlearning evaluation. For two-sample testing, experiments demonstrate that different $f$-divergences are sensitive to different localized differences, illustrating the importance of leveraging diverse statistics. For machine unlearning, we propose a relative test that distinguishes true unlearning failures from safe distributional variations.

CRSep 8, 2025
Sequentially Auditing Differential Privacy

Tomás González, Mateo Dulce-Rubio, Aaditya Ramdas et al.

We propose a practical sequential test for auditing differential privacy guarantees of black-box mechanisms. The test processes streams of mechanisms' outputs providing anytime-valid inference while controlling Type I error, overcoming the fixed sample size limitation of previous batch auditing methods. Experiments show this test detects violations with sample sizes that are orders of magnitude smaller than existing methods, reducing this number from 50K to a few hundred examples, across diverse realistic mechanisms. Notably, it identifies DP-SGD privacy violations in \textit{under} one training run, unlike prior methods needing full model training.

CRJan 28, 2022
A Joint Exponential Mechanism For Differentially Private Top-$k$

Jennifer Gillenwater, Matthew Joseph, Andrés Muñoz Medina et al.

We present a differentially private algorithm for releasing the sequence of $k$ elements with the highest counts from a data domain of $d$ elements. The algorithm is a "joint" instance of the exponential mechanism, and its output space consists of all $O(d^k)$ length-$k$ sequences. Our main contribution is a method to sample this exponential mechanism in time $O(dk\log(k) + d\log(d))$ and space $O(dk)$. Experiments show that this approach outperforms existing pure differential privacy methods and improves upon even approximate differential privacy methods for moderate $k$.

LGAug 14, 2020
Fast Dimension Independent Private AdaGrad on Publicly Estimated Subspaces

Peter Kairouz, Mónica Ribero, Keith Rush et al.

We revisit the problem of empirical risk minimziation (ERM) with differential privacy. We show that noisy AdaGrad, given appropriate knowledge and conditions on the subspace from which gradients can be drawn, achieves a regret comparable to traditional AdaGrad plus a well-controlled term due to noise. We show a convergence rate of $O(\text{Tr}(G_T)/T)$, where $G_T$ captures the geometry of the gradient subspace. Since $\text{Tr}(G_T)=O(\sqrt{T})$ we can obtain faster rates for convex and Lipschitz functions, compared to the $O(1/\sqrt{T})$ rate achieved by known versions of noisy (stochastic) gradient descent with comparable noise variance. In particular, we show that if the gradients lie in a known constant rank subspace, and assuming algorithmic access to an envelope which bounds decaying sensitivity, one can achieve faster convergence to an excess empirical risk of $\tilde O(1/εn)$, where $ε$ is the privacy budget and $n$ the number of samples. Letting $p$ be the problem dimension, this result implies that, by running noisy Adagrad, we can bypass the DP-SGD bound $\tilde O(\sqrt{p}/εn)$ in $T=(εn)^{2/(1+2α)}$ iterations, where $α\geq 0$ is a parameter controlling gradient norm decay, instead of the rate achieved by SGD of $T=ε^2n^2$. Our results operate with general convex functions in both constrained and unconstrained minimization. Along the way, we do a perturbation analysis of noisy AdaGrad of independent interest. Our utility guarantee for the private ERM problem follows as a corollary to the regret guarantee of noisy AdaGrad.

IRMar 1, 2020
Federating Recommendations Using Differentially Private Prototypes

Mónica Ribero, Jette Henderson, Sinead Williamson et al.

Machine learning methods allow us to make recommendations to users in applications across fields including entertainment, dating, and commerce, by exploiting similarities in users' interaction patterns. However, in domains that demand protection of personally sensitive data, such as medicine or banking, how can we learn such a model without accessing the sensitive data, and without inadvertently leaking private information? We propose a new federated approach to learning global and local private models for recommendation without collecting raw data, user statistics, or information about personal preferences. Our method produces a set of prototypes that allows us to infer global behavioral patterns, while providing differential privacy guarantees for users in any database of the system. By requiring only two rounds of communication, we both reduce the communication costs and avoid the excessive privacy loss associated with iterative procedures. We test our framework on synthetic data as well as real federated medical data and Movielens ratings data. We show local adaptation of the global model allows our method to outperform centralized matrix-factorization-based recommender system models, both in terms of accuracy of matrix reconstruction and in terms of relevance of the recommendations, while maintaining provable privacy guarantees. We also show that our method is more robust and is characterized by smaller variance than individual models learned by independent entities.