CRAILGMay 31, 2025

dpmm: Differentially Private Marginal Models, a Library for Synthetic Tabular Data Generation

arXiv:2506.00322v11 citationsh-index: 8Has Code
Originality Synthesis-oriented
AI Analysis

This work provides a practical tool for researchers and practitioners needing privacy-preserving synthetic data, though it is incremental as it packages existing methods into a library.

The authors tackled the problem of generating synthetic tabular data with differential privacy guarantees by developing an open-source library, dpmm, which includes three marginal models (PrivBayes, MST, and AIM) that achieve superior utility and offer rich functionality compared to alternatives.

We propose dpmm, an open-source library for synthetic data generation with Differentially Private (DP) guarantees. It includes three popular marginal models -- PrivBayes, MST, and AIM -- that achieve superior utility and offer richer functionality compared to alternative implementations. Additionally, we adopt best practices to provide end-to-end DP guarantees and address well-known DP-related vulnerabilities. Our goal is to accommodate a wide audience with easy-to-install, highly customizable, and robust model implementations. Our codebase is available from https://github.com/sassoftware/dpmm.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes