CLAILGCHEM-PHQMMay 2, 2024

CACTUS: Chemistry Agent Connecting Tool-Usage to Science

arXiv:2405.00972v145 citationsh-index: 7Has CodeACS Omega
Originality Incremental advance
AI Analysis

This provides an adaptable tool for researchers in chemistry and molecular discovery to accelerate tasks like molecular property prediction and drug-likeness assessment, though it is incremental as it combines existing LLMs with domain-specific tools.

The paper tackles the problem of LLMs lacking domain-specific knowledge and tools in chemistry by introducing CACTUS, an LLM-based agent that integrates cheminformatics tools, which significantly outperforms baseline LLMs on a benchmark of thousands of chemistry questions, with Gemma-7b and Mistral-7b achieving the highest accuracy.

Large language models (LLMs) have shown remarkable potential in various domains, but they often lack the ability to access and reason over domain-specific knowledge and tools. In this paper, we introduced CACTUS (Chemistry Agent Connecting Tool-Usage to Science), an LLM-based agent that integrates cheminformatics tools to enable advanced reasoning and problem-solving in chemistry and molecular discovery. We evaluate the performance of CACTUS using a diverse set of open-source LLMs, including Gemma-7b, Falcon-7b, MPT-7b, Llama2-7b, and Mistral-7b, on a benchmark of thousands of chemistry questions. Our results demonstrate that CACTUS significantly outperforms baseline LLMs, with the Gemma-7b and Mistral-7b models achieving the highest accuracy regardless of the prompting strategy used. Moreover, we explore the impact of domain-specific prompting and hardware configurations on model performance, highlighting the importance of prompt engineering and the potential for deploying smaller models on consumer-grade hardware without significant loss in accuracy. By combining the cognitive capabilities of open-source LLMs with domain-specific tools, CACTUS can assist researchers in tasks such as molecular property prediction, similarity searching, and drug-likeness assessment. Furthermore, CACTUS represents a significant milestone in the field of cheminformatics, offering an adaptable tool for researchers engaged in chemistry and molecular discovery. By integrating the strengths of open-source LLMs with domain-specific tools, CACTUS has the potential to accelerate scientific advancement and unlock new frontiers in the exploration of novel, effective, and safe therapeutic candidates, catalysts, and materials. Moreover, CACTUS's ability to integrate with automated experimentation platforms and make data-driven decisions in real time opens up new possibilities for autonomous discovery.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes