Learning interacting particle systems from unlabeled data
This addresses a fundamental challenge in scientific disciplines where data collection limitations or privacy constraints prevent trajectory labeling, offering a robust solution for large, high-dimensional systems.
The paper tackles the problem of learning potentials in interacting particle systems from unlabeled data, which lacks trajectory information, by introducing a trajectory-free self-test loss function; it shows that this method outperforms baseline approaches in numerical tests, tolerating large observation time steps.
Learning the potentials of interacting particle systems is a fundamental task across various scientific disciplines. A major challenge is that unlabeled data collected at discrete time points lack trajectory information due to limitations in data collection methods or privacy constraints. We address this challenge by introducing a trajectory-free self-test loss function that leverages the weak-form stochastic evolution equation of the empirical distribution. The loss function is quadratic in potentials, supporting parametric and nonparametric regression algorithms for robust estimation that scale to large, high-dimensional systems with big data. Systematic numerical tests show that our method outperforms baseline methods that regress on trajectories recovered via label matching, tolerating large observation time steps. We establish the convergence of parametric estimators as the sample size increases, providing a theoretical foundation for the proposed approach.