MLLGFeb 17, 2023

Copula-based transferable models for synthetic population generation

arXiv:2302.09193v318 citationsh-index: 27
Originality Incremental advance
AI Analysis

This addresses the high costs and small sample sizes in population synthesis for behavioral modeling, particularly at smaller geographical scales, though it is incremental as it builds on existing generative methods.

The paper tackles the problem of generating synthetic populations when only marginal distributions are known, by proposing a copula-based framework that transfers models between populations and spatial scales, achieving lower standardized root mean squared error (SRMSE) than Iterative Proportional Fitting and enhancing machine learning methods in matching marginals.

Population synthesis involves generating synthetic yet realistic representations of a target population of micro-agents for behavioral modeling and simulation. Traditional methods, often reliant on target population samples, such as census data or travel surveys, face limitations due to high costs and small sample sizes, particularly at smaller geographical scales. We propose a novel framework based on copulas to generate synthetic data for target populations where only empirical marginal distributions are known. This method utilizes samples from different populations with similar marginal dependencies, introduces a spatial component into population synthesis, and considers various information sources for more realistic generators. Concretely, the process involves normalizing the data and treating it as realizations of a given copula, and then training a generative model before incorporating the information on the marginals of the target population. Utilizing American Community Survey data, we assess our framework's performance through standardized root mean squared error (SRMSE) and so-called sampled zeros. We focus on its capacity to transfer a model learned from one population to another. Our experiments include transfer tests between regions at the same geographical level as well as to lower geographical levels, hence evaluating the framework's adaptability in varied spatial contexts. We compare Bayesian Networks, Variational Autoencoders, and Generative Adversarial Networks, both individually and combined with our copula framework. Results show that the copula enhances machine learning methods in matching the marginals of the reference data. Furthermore, it consistently surpasses Iterative Proportional Fitting in terms of SRMSE in the transferability experiments, while introducing unique observations not found in the original training sample.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes