Yamil J. Colón

LGApr 6, 2022

Machine learning identification of organic compounds using visible light

Thulasi Bikku, Rubén A. Fritz, Yamil J. Colón et al.

Identifying chemical compounds is essential in several areas of science and engineering. Laser-based techniques are promising for autonomous compound detection because the optical response of materials encodes enough electronic and vibrational information for remote chemical identification. This has been exploited using the fingerprint region of infrared absorption spectra, which involves a dense set of absorption peaks that are unique to individual molecules, thus facilitating chemical identification. However, optical identification using visible light has not been realized. Using decades of experimental refractive index data in the scientific literature of pure organic compounds and polymers over a broad range of frequencies from the ultraviolet to the far-infrared, we develop a machine learning classifier that can accurately identify organic species based on a single-wavelength dispersive measurement in the visible spectral region, away from absorption resonances. The optical classifier proposed here could be applied to autonomous material identification protocols or applications.

CHEM-PHMar 20, 2024

Considerations in the use of ML interaction potentials for free energy calculations

Orlando A. Mendible, Jonathan K. Whitmer, Yamil J. Colón

Machine learning force fields (MLFFs) promise to accurately describe the potential energy surface of molecules at the ab initio level of theory with improved computational efficiency. Within MLFFs, equivariant graph neural networks (EQNNs) have shown great promise in accuracy and performance and are the focus of this work. The capability of EQNNs to recover free energy surfaces (FES) remains to be thoroughly investigated. In this work, we investigate the impact of collective variables (CVs) distribution within the training data on the accuracy of EQNNs predicting the FES of butane and alanine dipeptide (ADP). A generalizable workflow is presented in which training configurations are generated with classical molecular dynamics simulations, and energies and forces are obtained with ab initio calculations. We evaluate how bond and angle constraints in the training data influence the accuracy of EQNN force fields in reproducing the FES of the molecules at both classical and ab initio levels of theory. Results indicate that the model's accuracy is unaffected by the distribution of sampled CVs during training, given that the training data includes configurations from characteristic regions of the system's FES. However, when the training data is obtained from classical simulations, the EQNN struggles to extrapolate the free energy for configurations with high free energy. In contrast, models trained with the same configurations on ab initio data show improved extrapolation accuracy. The findings underscore the difficulties in creating a comprehensive training dataset for EQNNs to predict FESs and highlight the importance of prior knowledge of the system's FES.

Yamil J. Colón

2 Papers