Ossi Räisä

h-index1

8papers

40citations

Novelty54%

AI Score55

Ranked #8,643 of 194,257 authors (top 4%)#2,279 in LG (top 6%)

8 Papers

13.1MLMay 28, 2022Code

Noise-Aware Statistical Inference with Differentially Private Synthetic Data

Ossi Räisä, Joonas Jälkö, Samuel Kaski et al.

While generation of synthetic data under differential privacy (DP) has received a lot of attention in the data privacy community, analysis of synthetic data has received much less. Existing work has shown that simply analysing DP synthetic data as if it were real does not produce valid inferences of population-level quantities. For example, confidence intervals become too narrow, which we demonstrate with a simple experiment. We tackle this problem by combining synthetic data analysis techniques from the field of multiple imputation (MI), and synthetic data generation using noise-aware (NA) Bayesian modeling into a pipeline NA+MI that allows computing accurate uncertainty estimates for population-level quantities from DP synthetic data. To implement NA+MI for discrete data generation using the values of marginal queries, we develop a novel noise-aware synthetic data generation algorithm NAPSU-MQ using the principle of maximum entropy. Our experiments demonstrate that the pipeline is able to produce accurate confidence intervals from DP synthetic data. The intervals become wider with tighter privacy to accurately capture the additional uncertainty stemming from DP noise.

7.8LGMay 25

On Reliability of Efficient Membership Inference Vulnerability Evaluation

Joonas Jälkö, Gauri Pradhan, Ossi Räisä et al.

Membership inference attacks (MIAs) are popular methods for empirically assessing the leakage of sensitive information in the training data through models or statistics learned from the data. The MIA vulnerability is often evaluated through false positive rate (FPR) and true positive rate (TPR) of a binary classifier that tries to predict whether a particular sample was in the training data. However, in order to reliably estimate the TPR especially for low FPR values, a lot of observations are needed, which in case of MIA translates to many target models, leading to large computational cost. To avoid excessive compute requirements, the MIA scores are often averaged over multiple individuals and multiple targeted models. We demonstrate two key weaknesses in this efficient MIA evaluation pipeline. First, we show that evaluating the TPR based on MIA scores concatenated across multiple individuals, commonly used to study vulnerabilities in the very low FPR regime, is not calibrated across the per-sample FPRs. This makes it unreliable as a tool for auditing differential privacy. To solve this, we propose a post-processing method to effectively calibrate the FPR across different samples. Second, we identify a finite population bias in the commonly used efficient likelihood-ratio attack (LiRA) implementation proposed by Carlini et al. 2022, leading to a positive bias in the per-sample vulnerability.

6.6CRMay 12

$f$-Differential Privacy Filters: Validity and Approximate Solutions

Long Tran, Antti Koskela, Ossi Räisä et al.

Accounting for privacy loss under fully adaptive composition -- where mechanism choice and privacy parameters may depend on the history of prior outputs -- is a central challenge in differential privacy (DP). Here, privacy filters are stopping rules ensuring a prescribed global budget is not exceeded. A leading candidate for optimal filter design is $f$-DP, which characterizes the full extent of adversarial hypothesis testing and recovers $(\varepsilon,δ)$-DP through piece-wise linear trade-off functions, while enabling tight $(\varepsilon,δ)$-DP accounting in standard compositions via tensor products. Yet whether such filters can be correctly defined under $f$-DP remains unclear. We show that the natural $f$-DP filter -- tracking path-wise accumulating tensor products and stopping when the prescribed curve is crossed -- is fundamentally invalid, precluding the direct use of standard efficient numerical Fast-Fourier-Transform accounting in the fully adaptive setting. We characterize this failure, establishing necessary and sufficient conditions for the natural filter's validity. Furthermore, we prove a fully adaptive central limit theorem for $f$-DP, establishing Gaussian convergence of cumulative privacy losses under full adaptivity. As a demonstration, we construct a closed-form approximate GDP filter for subsampled Gaussian mechanisms that provably outperforms RDP-based accounting in asymptotic regimes ($q\ll 1$ and $q\approx 1$) without tracking the full trade-off function, demonstrating that the slack in RDP is not intrinsic to adaptive composition -- though CLT-based approximations are known to be optimistic at realistic subsampling rates, a limitation that remains an open challenge.

13.1MLFeb 6, 2024Code

Subsampling is not Magic: Why Large Batch Sizes Work for Differentially Private Stochastic Optimisation

Ossi Räisä, Joonas Jälkö, Antti Honkela

We study how the batch size affects the total gradient variance in differentially private stochastic gradient descent (DP-SGD), seeking a theoretical explanation for the usefulness of large batch sizes. As DP-SGD is the basis of modern DP deep learning, its properties have been widely studied, and recent works have empirically found large batch sizes to be beneficial. However, theoretical explanations of this benefit are currently heuristic at best. We first observe that the total gradient variance in DP-SGD can be decomposed into subsampling-induced and noise-induced variances. We then prove that in the limit of an infinite number of iterations, the effective noise-induced variance is invariant to the batch size. The remaining subsampling-induced variance decreases with larger batch sizes, so large batches reduce the effective total gradient variance. We confirm numerically that the asymptotic regime is relevant in practical settings when the batch size is not small, and find that outside the asymptotic regime, the total gradient variance decreases even more with large batch sizes. We also find a sufficient condition that implies that large batch sizes similarly reduce effective DP noise variance for one iteration of DP-SGD.

15.7LGMay 28, 2025

Position: All Current Generative Fidelity and Diversity Metrics are Flawed

Ossi Räisä, Boris van Breugel, Mihaela van der Schaar

Any method's development and practical application is limited by our ability to measure its reliability. The popularity of generative modeling emphasizes the importance of good synthetic data metrics. Unfortunately, previous works have found many failure cases in current metrics, for example lack of outlier robustness and unclear lower and upper bounds. We propose a list of desiderata for synthetic data metrics, and a suite of sanity checks: carefully chosen simple experiments that aim to detect specific and known generative modeling failure modes. Based on these desiderata and the results of our checks, we arrive at our position: all current generative fidelity and diversity metrics are flawed. This significantly hinders practical use of synthetic data. Our aim is to convince the research community to spend more effort in developing metrics, instead of models. Additionally, through analyzing how current metrics fail, we provide practitioners with guidelines on how these metrics should (not) be used.

6.4LGFeb 6, 2024Code

A Bias-Variance Decomposition for Ensembles over Multiple Synthetic Datasets

Ossi Räisä, Antti Honkela

Recent studies have highlighted the benefits of generating multiple synthetic datasets for supervised learning, from increased accuracy to more effective model selection and uncertainty estimation. These benefits have clear empirical support, but the theoretical understanding of them is currently very light. We seek to increase the theoretical understanding by deriving bias-variance decompositions for several settings of using multiple synthetic datasets, including differentially private synthetic data. Our theory yields a simple rule of thumb to select the appropriate number of synthetic datasets in the case of mean-squared error and Brier score. We investigate how our theory works in practice with several real datasets, downstream predictors and error metrics. As our theory predicts, multiple synthetic datasets often improve accuracy, while a single large synthetic dataset gives at best minimal improvement, showing that our insights are practically relevant.

4.6LGJun 12, 2024Code

Noise-Aware Differentially Private Regression via Meta-Learning

Ossi Räisä, Stratis Markou, Matthew Ashman et al.

Many high-stakes applications require machine learning models that protect user privacy and provide well-calibrated, accurate predictions. While Differential Privacy (DP) is the gold standard for protecting user privacy, standard DP mechanisms typically significantly impair performance. One approach to mitigating this issue is pre-training models on simulated data before DP learning on the private data. In this work we go a step further, using simulated data to train a meta-learning model that combines the Convolutional Conditional Neural Process (ConvCNP) with an improved functional DP mechanism of Hall et al. [2013] yielding the DPConvCNP. DPConvCNP learns from simulated data how to map private data to a DP predictive model in one forward pass, and then provides accurate, well-calibrated predictions. We compare DPConvCNP with a DP Gaussian Process (GP) baseline with carefully tuned hyperparameters. The DPConvCNP outperforms the GP baseline, especially on non-Gaussian data, yet is much faster at test time and requires less tuning.

5.9COJun 17, 2021Code

Differentially Private Hamiltonian Monte Carlo

Ossi Räisä, Antti Koskela, Antti Honkela

Markov chain Monte Carlo (MCMC) algorithms have long been the main workhorses of Bayesian inference. Among them, Hamiltonian Monte Carlo (HMC) has recently become very popular due to its efficiency resulting from effective use of the gradients of the target distribution. In privacy-preserving machine learning, differential privacy (DP) has become the gold standard in ensuring that the privacy of data subjects is not violated. Existing DP MCMC algorithms either use random-walk proposals, or do not use the Metropolis--Hastings (MH) acceptance test to ensure convergence without decreasing their step size to zero. We present a DP variant of HMC using the MH acceptance test that builds on a recently proposed DP MCMC algorithm called the penalty algorithm, and adds noise to the gradient evaluations of HMC. We prove that the resulting algorithm converges to the correct distribution, and is ergodic. We compare DP-HMC with the existing penalty, DP-SGLD and DP-SGNHT algorithms, and find that DP-HMC has better or equal performance than the penalty algorithm, and performs more consistently than DP-SGLD or DP-SGNHT.