SMILES-Prompting: A Novel Approach to LLM Jailbreak Attacks in Chemical Synthesis
This addresses a security vulnerability in LLMs for chemistry applications, highlighting risks of misuse in generating dangerous chemical synthesis information.
The paper tackled the problem of LLMs providing instructions for synthesizing hazardous substances in chemistry by introducing SMILES-prompting, a novel attack technique that uses SMILES notation to bypass safety mechanisms, with results showing it effectively evades current safeguards.
The increasing integration of large language models (LLMs) across various fields has heightened concerns about their potential to propagate dangerous information. This paper specifically explores the security vulnerabilities of LLMs within the field of chemistry, particularly their capacity to provide instructions for synthesizing hazardous substances. We evaluate the effectiveness of several prompt injection attack methods, including red-teaming, explicit prompting, and implicit prompting. Additionally, we introduce a novel attack technique named SMILES-prompting, which uses the Simplified Molecular-Input Line-Entry System (SMILES) to reference chemical substances. Our findings reveal that SMILES-prompting can effectively bypass current safety mechanisms. These findings highlight the urgent need for enhanced domain-specific safeguards in LLMs to prevent misuse and improve their potential for positive social impact.