MLLGDec 6, 2023

Balanced Marginal and Joint Distributional Learning via Mixture Cramer-Wold Distance

arXiv:2312.03307v1h-index: 4
Originality Incremental advance
AI Analysis

This work addresses the need for effective synthetic data generation in machine learning by improving distributional learning, though it appears incremental as it builds upon existing Cramer-Wold distance methods.

The paper tackles the problem of measuring discrepancies between high-dimensional distributions in generative models by introducing the mixture Cramer-Wold distance, which captures both marginal and joint distributional information, leading to the CWDAE model that shows remarkable performance in generating synthetic tabular data with adjustable privacy levels.

In the process of training a generative model, it becomes essential to measure the discrepancy between two high-dimensional probability distributions: the generative distribution and the ground-truth distribution of the observed dataset. Recently, there has been growing interest in an approach that involves slicing high-dimensional distributions, with the Cramer-Wold distance emerging as a promising method. However, we have identified that the Cramer-Wold distance primarily focuses on joint distributional learning, whereas understanding marginal distributional patterns is crucial for effective synthetic data generation. In this paper, we introduce a novel measure of dissimilarity, the mixture Cramer-Wold distance. This measure enables us to capture both marginal and joint distributional information simultaneously, as it incorporates a mixture measure with point masses on standard basis vectors. Building upon the mixture Cramer-Wold distance, we propose a new generative model called CWDAE (Cramer-Wold Distributional AutoEncoder), which shows remarkable performance in generating synthetic data when applied to real tabular datasets. Furthermore, our model offers the flexibility to adjust the level of data privacy with ease.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes