Guangye Chen

2papers

2 Papers

DCMar 17, 2021
An unsupervised machine-learning checkpoint-restart algorithm using Gaussian mixtures for particle-in-cell simulations

Guangye Chen, Luis Chacón, Truong B. Nguyen

We propose an unsupervised machine-learning checkpoint-restart (CR) lossy algorithm for particle-in-cell (PIC) algorithms using Gaussian mixtures (GM). The algorithm features a particle compression stage and a particle reconstruction stage, where a continuum particle distribution function is constructed and resampled, respectively. To guarantee fidelity of the CR process, we ensure the exact preservation of charge, momentum, and energy for both compression and reconstruction stages, everywhere on the mesh. We also ensure the preservation of Gauss' law after particle reconstruction. As a result, the GM CR algorithm is shown to provide a clean, conservative restart capability while potentially affording orders of magnitude savings in input/output requirements. We demonstrate the algorithm using a recently developed exactly energy- and charge-conserving PIC algorithm on physical problems of interest, with compression factors $\gtrsim75$ with no appreciable impact on the quality of the restarted dynamics.

LGSep 26, 2020
An Adaptive EM Accelerator for Unsupervised Learning of Gaussian Mixture Models

Truong Nguyen, Guangye Chen, Luis Chacon

We propose an Anderson Acceleration (AA) scheme for the adaptive Expectation-Maximization (EM) algorithm for unsupervised learning a finite mixture model from multivariate data (Figueiredo and Jain 2002). The proposed algorithm is able to determine the optimal number of mixture components autonomously, and converges to the optimal solution much faster than its non-accelerated version. The success of the AA-based algorithm stems from several developments rather than a single breakthrough (and without these, our tests demonstrate that AA fails catastrophically). To begin, we ensure the monotonicity of the likelihood function (a the key feature of the standard EM algorithm) with a recently proposed monotonicity-control algorithm (Henderson and Varahdan 2019), enhanced by a novel monotonicity test with little overhead. We propose nimble strategies for AA to preserve the positive definiteness of the Gaussian weights and covariance matrices strictly, and to conserve up to the second moments of the observed data set exactly. Finally, we employ a K-means clustering algorithm using the gap statistic to avoid excessively overestimating the initial number of components, thereby maximizing performance. We demonstrate the accuracy and efficiency of the algorithm with several synthetic data sets that are mixtures of Gaussians distributions of known number of components, as well as data sets generated from particle-in-cell simulations. Our numerical results demonstrate speed-ups with respect to non-accelerated EM of up to 60X when the exact number of mixture components is known, and between a few and more than an order of magnitude with component adaptivity.