SemaPop: Semantic-Persona Conditioned Population Synthesis
This work addresses the challenge of generating realistic populations for socio-economic simulation by bridging semantic and statistical gaps, though it is incremental as it builds on existing generative models.
The paper tackles the problem of population synthesis by integrating semantic persona conditioning from survey data with statistical constraints, resulting in improved alignment with target distributions and sample-level feasibility.
Population synthesis is a critical component of individual-level socio-economic simulation, yet remains challenging due to the need to jointly represent statistical structure and latent behavioral semantics. Existing population synthesis approaches predominantly rely on structured attributes and statistical constraints, leaving a gap in semantic-conditioned population generation that can capture abstract behavioral patterns implicitly in survey data. This study proposes SemaPop, a semantic-statistical population synthesis model that integrates large language models (LLMs) with generative population modeling. SemaPop derives high-level persona representations from individual survey records and incorporates them as semantic conditioning signals for population generation, while marginal regularization is introduced to enforce alignment with target population marginals. In this study, the framework is instantiated using a Wasserstein GAN with gradient penalty (WGAN-GP) backbone, referred to as SemaPop-GAN. Extensive experiments demonstrate that SemaPop-GAN achieves improved generative performance, yielding closer alignment with target marginal and joint distributions while maintaining sample-level feasibility and diversity under semantic conditioning. Ablation studies further confirm the contribution of semantic persona conditioning and architectural design choices to balancing marginal consistency and structural realism. These results demonstrate that SemaPop-GAN enables controllable and interpretable population synthesis through effective semantic-statistical information fusion. SemaPop-GAN also provides a promising modular foundation for developing generative population projection systems that integrate individual-level behavioral semantics with population-level statistical constraints.