Andy Dong

h-index30

4papers

3,088citations

Novelty52%

AI Score47

Ranked #30,516 of 194,257 authors (top 16%)#7,235 in LG (top 18%)

4 Papers

7.6LGMay 8Code

Less Random, More Private: What is the Optimal Subsampling Scheme for DP-SGD?

Andy Dong, Ayfer Özgür

Poisson subsampling is the default sampling scheme in differentially private machine learning, largely because its unstructured randomness yields tractable privacy amplification analyses. Yet this same randomness introduces substantial participation variance: each sample appears in very different numbers of training iterations. In this work, we show that this variance is not merely a practical artifact to be tolerated, but a fundamental source of suboptimal privacy amplification. We prove that Balanced Iteration Subsampling (BIS), a structured scheme in which each sample participates in exactly a fixed number of iterations, achieves stronger privacy amplification than Poisson subsampling and is optimal at both extremes of the noise spectrum ($σ\to 0$ and $σ\to \infty$). Our analysis reveals that the privacy-noise tradeoff is governed not by maximizing randomness, but by eliminating participation variance while preserving uniform marginal participation across iterations. To translate this asymptotic theory into finite-noise guarantees, we introduce a practical near-exact Monte Carlo accountant for BIS, which removes the analytical slack of existing RDP and composition-based PLD analyses. Evaluations across more than 60 practical DP-SGD configurations show that BIS consistently outperforms Poisson subsampling in the low-noise regimes most relevant for high-utility private training, reducing the required noise multiplier by up to $9.6\%$. These results overturn the common intuition that more sampling randomness necessarily yields stronger privacy amplification: in DP-SGD, structured participation can be both more practical and more private. Our implementation is available at https://github.com/dong-xin-ao-andy/bis-mc-accountant.

8.9CRJun 11

Privacy Amplification for BandMF via $b$-Min-Sep Subsampling

Andy Dong, Arun Ganesh

We study privacy amplification for BandMF, i.e., DP-SGD with correlated noise across iterations via a banded correlation matrix. We propose $b$-min-sep subsampling, a new subsampling scheme that generalizes Poisson and balls-in-bins subsampling, extends prior practical batching strategies for BandMF, and enables stronger privacy amplification than cyclic Poisson while preserving the structural properties needed for analysis. We give a near-exact privacy analysis using Monte Carlo accounting, based on a dynamic program that leverages the Markovian structure in the subsampling procedure. We show that $b$-min-sep matches cyclic Poisson subsampling in the high noise regime and achieves strictly better guarantees in the mid-to-low noise regime, with experimental results that bolster our claims. We further show that unlike previous BandMF subsampling schemes, our $b$-min-sep subsampling naturally extends to the multi-attribution user-level privacy setting.

15.7LGMar 4, 2025

Leveraging Randomness in Model and Data Partitioning for Privacy Amplification

Andy Dong, Wei-Ning Chen, Ayfer Ozgur

We study how inherent randomness in the training process -- where each sample (or client in federated learning) contributes only to a randomly selected portion of training -- can be leveraged for privacy amplification. This includes (1) data partitioning, where a sample participates in only a subset of training iterations, and (2) model partitioning, where a sample updates only a subset of the model parameters. We apply our framework to model parallelism in federated learning, where each client updates a randomly selected subnetwork to reduce memory and computational overhead, and show that existing methods, e.g. model splitting or dropout, provide a significant privacy amplification gain not captured by previous privacy analysis techniques. Additionally, we introduce Balanced Iteration Subsampling, a new data partitioning method where each sample (or client) participates in a fixed number of training iterations. We show that this method yields stronger privacy amplification than Poisson (i.i.d.) sampling of data (or clients). Our results demonstrate that randomness in the training process, which is structured rather than i.i.d. and interacts with data in complex ways, can be systematically leveraged for significant privacy amplification.

1.2DCSep 27, 2013

The failure tolerance of mechatronic software systems to random and targeted attacks

Dharshana Kasthurirathna, Andy Dong, Mahendrarajah Piraveenan et al.

This paper describes a complex networks approach to study the failure tolerance of mechatronic software systems under various types of hardware and/or software failures. We produce synthetic system architectures based on evidence of modular and hierarchical modular product architectures and known motifs for the interconnection of physical components to software. The system architectures are then subject to various forms of attack. The attacks simulate failure of critical hardware or software. Four types of attack are investigated: degree centrality, betweenness centrality, closeness centrality and random attack. Failure tolerance of the system is measured by a 'robustness coefficient', a topological 'size' metric of the connectedness of the attacked network. We find that the betweenness centrality attack results in the most significant reduction in the robustness coefficient, confirming betweenness centrality, rather than the number of connections (i.e. degree), as the most conservative metric of component importance. A counter-intuitive finding is that "designed" system architectures, including a bus, ring, and star architecture, are not significantly more failure-tolerant than interconnections with no prescribed architecture, that is, a random architecture. Our research provides a data-driven approach to engineer the architecture of mechatronic software systems for failure tolerance.