Estimation of System Parameters Including Repeated Cross-Sectional Data through Emulator-Informed Deep Generative Model
This addresses a bottleneck in fields like politics, economics, and biology where data are limited and heterogeneous, offering a novel method for improved parameter estimation.
The authors tackled the problem of estimating differential equation parameters from repeated cross-sectional data, which is challenging due to heterogeneities, and proposed the EIDGM method, demonstrating superior accuracy in capturing parameter distributions across models like exponential growth and the Lorenz system, with successful application to experimental amyloid beta data.
Differential equations (DEs) are crucial for modeling the evolution of natural or engineered systems. Traditionally, the parameters in DEs are adjusted to fit data from system observations. However, in fields such as politics, economics, and biology, available data are often independently collected at distinct time points from different subjects (i.e., repeated cross-sectional (RCS) data). Conventional optimization techniques struggle to accurately estimate DE parameters when RCS data exhibit various heterogeneities, leading to a significant loss of information. To address this issue, we propose a new estimation method called the emulator-informed deep-generative model (EIDGM), designed to handle RCS data. Specifically, EIDGM integrates a physics-informed neural network-based emulator that immediately generates DE solutions and a Wasserstein generative adversarial network-based parameter generator that can effectively mimic the RCS data. We evaluated EIDGM on exponential growth, logistic population models, and the Lorenz system, demonstrating its superior ability to accurately capture parameter distributions. Additionally, we applied EIDGM to an experimental dataset of Amyloid beta 40 and beta 42, successfully capturing diverse parameter distribution shapes. This shows that EIDGM can be applied to model a wide range of systems and extended to uncover the operating principles of systems based on limited data.