Position: Prioritize Identifying Structure, Not Complex Models, for Scientific Discovery

arXiv:2606.0263231.4

Predicted impact top 55% in ML · last 90 daysOriginality Incremental advance

AI Analysis

For scientists and ML practitioners using LLMs for hypothesis generation, the paper highlights a fundamental underdetermination problem that threatens the validity of mechanistic claims.

The paper argues that mechanistic learning from observational data is generically underdetermined in high-dimensional proxy regimes, making predictive success insufficient for mechanism discovery, especially with LLMs that collapse explanation classes. It proposes standards for mechanistic ML to ensure LLMs support rather than simulate science.

Modern Machine Learning (ML) and Artificial Intelligence (AI) models, especially large language models (LLMs), are increasingly used to generate scientific hypotheses and mechanistic explanations from observational data. This position paper argues that in the high-dimensional proxy regimes where modern ML excels, mechanistic learning is generically underdetermined: many incompatible mechanisms induce essentially the same observational relationships on the support of the data, so predictive success and coherent explanations are insufficient evidence of mechanism discovery. This underdetermination becomes uniquely hazardous with large language models (LLMs), which tend to collapse large equivalence classes of explanations into a single fluent narrative. This paper proposes concrete standards for ``mechanistic ML,'' and argues these norms are necessary if LLM-centered workflows are to support science rather than merely simulate it.

View on arXiv PDF

Similar