STAT-MECHMar 1, 2022
Molecular Dynamics of Polymer-lipids in Solution from Supervised Machine LearningJames Andrews, Olga Gkountouna, Estela Blaisten-Barojas
Machine learning techniques including neural networks are popular tools for materials and chemical scientists with applications that may provide viable alternative methods in the analysis of structure and energetics of systems ranging from crystals to biomolecules. However, efforts are less abundant for prediction of dynamics. Here we explore the ability of three well established recurrent neural network architectures for forecasting the energetics of a macromolecular polymer-lipid aggregate solvated in ethyl acetate at ambient conditions. Data models generated from recurrent neural networks are trained and tested on nanoseconds-long time series of the intra-macromolecules potential energy and their interaction energy with the solvent generated from Molecular Dynamics and containing half million points. Our exhaustive analyses convey that the three recurrent neural network investigated generate data models with limited capability of reproducing the energetic fluctuations and yielding short or long term energetics forecasts with underlying distribution of points inconsistent with the input series distributions. We propose an in silico experimental protocol consisting on forming an ensemble of artificial network models trained on an ensemble of series with additional features from time series containing pre-clustered time patterns of the original series. The forecast process improves by predicting a band of forecasted time series with a spread of values consistent with the molecular dynamics energy fluctuations span. However, the distribution of points from the band of forecasts is not optimal. Although the three inspected recurrent neural networks were unable of generating single models that reproduce the actual fluctuations of the inspected molecular system energies in thermal equilibrium at the nanosecond scale, the proposed protocol provides useful estimates of the molecular fate
NEApr 29
Uncertainty-Aware Offline Data-Driven Multi-Objective OptimizationHuanbo Lyu, Miqing Li, Shiqiao Zhou et al.
In offline data-driven multi-objective optimization (MOO), optimization is performed using surrogate models trained only on an offline dataset. These surrogate models contain inherent errors and uncertainty. This epistemic uncertainty can lead to incorrect dominance judgments, thereby misleading the search process. Existing methods mitigate this issue by incorporating uncertainty estimates from Gaussian Process Regression (GPR) to correct dominance judgments; however, they are restricted to GPR, and their optimization strategies cannot be scaled to other uncertainty quantification methods. In addition, GPR-based surrogates suffer from high computational cost. We propose a simple yet effective dual-ranking strategy that flexibly leverages both predictive results and uncertainty estimates from different surrogate models. By performing non-dominated sorting on candidate solutions using both surrogate-based fitness values and uncertainty-aware fitness values, the proposed method prioritizes candidate solutions that are simultaneously high-quality and reliable. Through extensive experimental evaluations, including ablation, sensitivity, and comparative experiments, we demonstrate the effectiveness and robustness of the proposed dual-ranking strategy working with different surrogates. Our dual-ranking framework offers more robust solutions for data-limited, real-world applications.
GNJan 4, 2025
iTARGET: Interpretable Tailored Age Regression for Grouped Epigenetic TraitsZipeng Wu, Daniel Herring, Fabian Spill et al.
Accurately predicting chronological age from DNA methylation patterns is crucial for advancing biological age estimation. However, this task is made challenging by Epigenetic Correlation Drift (ECD) and Heterogeneity Among CpGs (HAC), which reflect the dynamic relationship between methylation and age across different life stages. To address these issues, we propose a novel two-phase algorithm. The first phase employs similarity searching to cluster methylation profiles by age group, while the second phase uses Explainable Boosting Machines (EBM) for precise, group-specific prediction. Our method not only improves prediction accuracy but also reveals key age-related CpG sites, detects age-specific changes in aging rates, and identifies pairwise interactions between CpG sites. Experimental results show that our approach outperforms traditional epigenetic clocks and machine learning models, offering a more accurate and interpretable solution for biological age estimation with significant implications for aging research.