AILGCHEM-PHJun 13, 2024

Automated Molecular Concept Generation and Labeling with Large Language Models

arXiv:2406.09612v220 citations
Originality Highly original
AI Analysis

This addresses the need for explainable AI in molecular research, offering an automated solution that reduces manual effort and improves performance.

The paper tackles the lack of explainable concept-based models in molecular science by introducing AutoMolCo, a framework that uses LLMs to automatically generate and label molecular concepts, enabling linear models to outperform GNNs and LLMs on benchmarks.

Artificial intelligence (AI) is transforming scientific research, with explainable AI methods like concept-based models (CMs) showing promise for new discoveries. However, in molecular science, CMs are less common than black-box models like Graph Neural Networks (GNNs), due to their need for predefined concepts and manual labeling. This paper introduces the Automated Molecular Concept (AutoMolCo) framework, which leverages Large Language Models (LLMs) to automatically generate and label predictive molecular concepts. Through iterative concept refinement, AutoMolCo enables simple linear models to outperform GNNs and LLM in-context learning on several benchmarks. The framework operates without human knowledge input, overcoming limitations of existing CMs while maintaining explainability and allowing easy intervention. Experiments on MoleculeNet and High-Throughput Experimentation (HTE) datasets demonstrate that AutoMolCo-induced explainable CMs are beneficial for molecular science research.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes