AI LGFeb 22, 2025

Curie: Toward Rigorous and Automated Scientific Experimentation with AI Agents

Patrick Tser Jern Kon, Jiachen Liu, Qiuyi Ding, Yiming Qiu, Zhenning Yang, Yibo Huang, Jayanth Srinivasa, Myungjin Lee, Mosharaf Chowdhury, Ang Chen

arXiv:2502.16069v225.128 citationsh-index: 13Has Code

Originality Incremental advance

AI Analysis

This addresses the problem of ensuring rigor in AI-driven scientific experimentation for researchers, though it is incremental as it builds on existing LLM capabilities.

The paper tackles the challenge of automating rigorous scientific experimentation with AI agents by proposing Curie, a framework that improves reliability, control, and interpretability, achieving a 3.4× improvement in correctly answering experimental questions compared to the strongest baseline.

Scientific experimentation, a cornerstone of human progress, demands rigor in reliability, methodical control, and interpretability to yield meaningful results. Despite the growing capabilities of large language models (LLMs) in automating different aspects of the scientific process, automating rigorous experimentation remains a significant challenge. To address this gap, we propose Curie, an AI agent framework designed to embed rigor into the experimentation process through three key components: an intra-agent rigor module to enhance reliability, an inter-agent rigor module to maintain methodical control, and an experiment knowledge module to enhance interpretability. To evaluate Curie, we design a novel experimental benchmark composed of 46 questions across four computer science domains, derived from influential research papers, and widely adopted open-source projects. Compared to the strongest baseline tested, we achieve a 3.4$\times$ improvement in correctly answering experimental questions. Curie is open-sourced at https://github.com/Just-Curieous/Curie.

View on arXiv PDF Code

Similar