GNLGNov 2, 2020

Synthetic Data Generation for Economists

arXiv:2011.01374v28 citations
AI Analysis

This tackles data access issues for economists and researchers in tech companies, but it is incremental as it reviews existing approaches without introducing new methods.

The paper addresses the reproducibility problem in economic analyses caused by the use of sensitive proprietary data, proposing synthetic data generation as a solution to allow external replication of methodologies.

As more tech companies engage in rigorous economic analyses, we are confronted with a data problem: in-house papers cannot be replicated due to use of sensitive, proprietary, or private data. Readers are left to assume that the obscured true data (e.g., internal Google information) indeed produced the results given, or they must seek out comparable public-facing data (e.g., Google Trends) that yield similar results. One way to ameliorate this reproducibility issue is to have researchers release synthetic datasets based on their true data; this allows external parties to replicate an internal researcher's methodology. In this brief overview, we explore synthetic data generation at a high level for economic analyses.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes