CHEM-PHLGMay 17, 2023

Prompt Engineering for Transformer-based Chemical Similarity Search Identifies Structurally Distinct Functional Analogues

arXiv:2305.16330v1Has Code
Originality Incremental advance
AI Analysis

This incremental approach may aid in discovering novel structural classes of molecules for drug and dye applications.

The paper tackled the problem of identifying functionally similar but structurally distinct molecules in chemical similarity searches by developing a prompt engineering strategy for a chemical language model, resulting in the discovery of molecules unlikely to be found with traditional methods.

Chemical similarity searches are widely used in-silico methods for identifying new drug-like molecules. These methods have historically relied on structure-based comparisons to compute molecular similarity. Here, we use a chemical language model to create a vector-based chemical search. We extend implementations by creating a prompt engineering strategy that utilizes two different chemical string representation algorithms: one for the query and the other for the database. We explore this method by reviewing the search results from five drug-like query molecules (penicillin G, nirmatrelvir, zidovudine, lysergic acid diethylamide, and fentanyl) and three dye-like query molecules (acid blue 25, avobenzone, and 2-diphenylaminocarbazole). We find that this novel method identifies molecules that are functionally similar to the query, indicated by the associated patent literature, and that many of these molecules are structurally distinct from the query, making them unlikely to be found with traditional chemical similarity search methods. This method may aid in the discovery of novel structural classes of molecules that achieve target functionality.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes