Agentic AI Scientists Are Not Built For Autonomous Scientific Discovery
For researchers building autonomous AI scientists, this paper identifies fundamental design flaws that prevent current systems from achieving true autonomy, though it is a position paper without empirical results.
This position paper argues that current agentic AI scientists are not suitable for fully autonomous scientific discovery due to four key challenges: the McNamara fallacy in problem selection, omission of tacit knowledge in LLM training, diversity compression from preference optimization, and lack of physical experiment feedback. The authors recommend using simulations as verifiers, persistent world models, centralized preregistration, and need-driven application.
A growing body of work pursues AI scientists capable of end-to-end autonomous scientific discovery. This position paper argues that although they already function as co-scientists, agentic AI scientists are not built for autonomous scientific discovery. We identify the following challenges in building and deploying autonomous AI scientists: (1) Problem selection is influenced by the McNamara fallacy; (2) Agents are built on large language models (LLMs) whose training corpora omit tacit procedural and failure knowledge of laboratory practice; (3) Preference optimisation during post-training compresses output diversity toward consensus; and (4) Most scientific benchmarks measure single-turn prediction accuracy and lack feedback from physical experiments back to the computational model. These challenges are not just questions of scale and scaffolding; they require revisiting fundamental design choices. To build truly autonomous AI scientists, we recommend the use of scientific simulations as verifiers for training, the design of persistent world models that represent the shifting objectives governing real investigations, the establishment of a centralized preregistration repository for all AI-generated hypotheses, and application driven by scientific need rather than tool affordance.