SHA-256 Infused Embedding-Driven Generative Modeling of High-Energy Molecules in Low-Data Regimes
This work addresses the problem of discovering high-energy materials for propulsion and defense applications, which is hindered by scarce experimental data, representing an incremental advancement in generative modeling for chemistry.
The paper tackled the challenge of discovering high-energy molecules with limited data by introducing a novel embedding strategy that combines SHA-256 embeddings with trainable representations, achieving 67.5% validity and 37.5% novelty in generated molecules and identifying 37 new super explosives with predicted detonation velocities over 9 km/s.
High-energy materials (HEMs) are critical for propulsion and defense domains, yet their discovery remains constrained by experimental data and restricted access to testing facilities. This work presents a novel approach toward high-energy molecules by combining Long Short-Term Memory (LSTM) networks for molecular generation and Attentive Graph Neural Networks (GNN) for property predictions. We propose a transformative embedding space construction strategy that integrates fixed SHA-256 embeddings with partially trainable representations. Unlike conventional regularization techniques, this changes the representational basis itself, reshaping the molecular input space before learning begins. Without recourse to pretraining, the generator achieves 67.5% validity and 37.5% novelty. The generated library exhibits a mean Tanimoto coefficient of 0.214 relative to training set signifying the ability of framework to generate a diverse chemical space. We identified 37 new super explosives higher than 9 km/s predicted detonation velocity.