CHEM-PHLGApr 11, 2023

Bayesian Optimization of Catalysis With In-Context Learning

arXiv:2304.05341v227 citationsh-index: 23Has Code
Originality Highly original
AI Analysis

This work addresses the problem of slow materials discovery for researchers in catalysis and materials science by accelerating optimization through a novel AI-driven approach.

The paper tackles the challenge of Bayesian optimization in materials discovery by using frozen large language models for in-context learning to perform regression with uncertainty estimation, enabling efficient navigation of design spaces without explicit training; it matches or outperforms Gaussian processes on benchmarks and identifies near-optimal catalysts in live experiments within six iterations from 3,700 candidates.

Large language models (LLMs) can perform accurate classification with zero or few examples through in-context learning. We extend this capability to regression with uncertainty estimation using frozen LLMs (e.g., GPT-3.5, Gemini), enabling Bayesian optimization (BO) in natural language without explicit model training or feature engineering. We apply this to materials discovery by representing experimental catalyst synthesis and testing procedures as natural language prompts. A key challenge in materials discovery is the need to characterize suboptimal candidates, which slows progress. While BO is effective for navigating large design spaces, standard surrogate models like Gaussian processes assume smoothness and continuity, an assumption that fails in highly non-linear domains such as heterogeneous catalysis. Our task-agnostic BO workflow overcomes this by operating directly in language space, producing interpretable and actionable predictions without requiring structural or electronic descriptors. On benchmarks like aqueous solubility and oxidative coupling of methane (OCM), BO-ICL matches or outperforms Gaussian processes. In live experiments on the reverse water-gas shift (RWGS) reaction, BO-ICL identifies near-optimal multi-metallic catalysts within six iterations from a pool of 3,700 candidates. Our method redefines materials representation and accelerates discovery, with broad applications across catalysis, materials science, and AI. Code: https://github.com/ur-whitelab/BO-ICL.

Code Implementations2 repos
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes