Generative Chemical Language Models for Energetic Materials Discovery
This work provides a transfer-learning framework for data-sparse discovery problems, extending generative language models beyond pharmacology to energetic materials.
The authors developed generative chemical language models pretrained on extensive chemical data and fine-tuned on curated energetic materials datasets, enabling the discovery of new energetic materials despite limited data availability.
The discovery of new energetic materials remains a pressing challenge hindered by limited availability of high-quality data. To address this, we have developed generative molecular language models that have been pretrained on extensive chemical data and then fine-tuned with curated energetic materials datasets. This transfer-learning strategy extends the chemical language model capabilities beyond the pharmacological space in which they have been predominantly developed, offering a framework applicable to other data-spare discovery problems. Furthermore, we discuss the benefits of fragment-based molecular encodings for chemical language models, in particular in constructing synthetically accessible structures. Together, these advances provide a foundation for accelerating the design of next-generation energetic materials with demanding performance requirements.