Sparse Data Generation Using Diffusion Models
This addresses the problem of sparse data generation for domains such as economics, recommender systems, astronomy, and biomedical sciences, offering a novel method for a known bottleneck.
The paper tackles the challenge of generating high-fidelity synthetic sparse data by introducing Sparse Data Diffusion (SDD), a method that extends diffusion models with Sparsity Bits to explicitly represent zeros, achieving high fidelity in sparsity representation and data quality across domains like physics and biology.
Sparse data is ubiquitous, appearing in numerous domains, from economics and recommender systems to astronomy and biomedical sciences. However, efficiently generating high-fidelity synthetic sparse data remains a significant challenge. We introduce Sparse Data Diffusion (SDD), a novel method for generating sparse data. SDD extends continuous state-space diffusion models with an explicit representation of exact zeros by modeling sparsity through the introduction of Sparsity Bits. Empirical validation in various domains, including two scientific applications in physics and biology, demonstrates that SDD achieves high fidelity in representing data sparsity while preserving the quality of the generated data.