Machine Learning Workflow for Analysis of High-Dimensional Order Parameter Space: A Case Study of Polymer Crystallization from Molecular Dynamics Simulations
This provides a data-driven strategy for order parameter selection to monitor structural transformations in polymer simulations, though it appears incremental as it builds on existing simulation and ML techniques.
The researchers tackled the problem of accurately quantifying crystallinity in polymers from molecular dynamics simulations by developing a machine learning workflow that identifies a minimal set of three order parameters, achieving over 0.98 AUC classification performance.
Currently, identification of crystallization pathways in polymers is being carried out using molecular simulation-based data on a preset cut-off point on a single order parameter (OP) to define nucleated or crystallized regions. Aside from sensitivity to cut-off, each of these OPs introduces its own systematic biases. In this study, an integrated machine learning workflow is presented to accurately quantify crystallinity in polymeric systems using atomistic molecular dynamics data. Each atom is represented by a high-dimensional feature vector that combines geometric, thermodynamic-like, and symmetry-based descriptors. Low dimensional embeddings are employed to expose latent structural fingerprints within atomic environments. Subsequently, unsupervised clustering on the embeddings identified crystalline and amorphous atoms with high fidelity. After generating high quality labels with multidimensional data, we use supervised learning techniques to identify a minimal set of order parameters that can fully capture this label. Various tests were conducted to reduce the feature set, demonstrating that using only three order parameters is sufficient to recreate the crystallization labels. Based on these observed OPs, the crystallinity index (C-index) is defined as the logistic regression model's probability of crystallinity, remaining bimodal throughout the process and achieving over 0.98 classification performance (AUC). Notably, a model trained on one or a few snapshots enables efficient on-the-fly computation of crystallinity. Lastly, we demonstrate how the optimal C-index fit evolves during various stages of crystallization, supporting the hypothesis that entropy dominates early nucleation, while symmetry gains relevance later. This workflow provides a data-driven strategy for OP selection and a metric to monitor structural transformations in large-scale polymer simulations.