Position: AI for Science Should Treat Measurement-to-Dataset Pipelines as Inference Components
For AI4Science practitioners, it identifies a fundamental flaw in current workflows that undermines reproducibility and uncertainty quantification.
The paper argues that measurement-to-dataset pipelines in AI for Science should be treated as inference components, not fixed interfaces, and demonstrates via a neuroscience audit that only ~0.0004% of pipelines survive cross-dataset stability tests.
AI for Science (AI4Science) workflows often treat the released dataset as a fixed interface to the underlying system. However, in domains relying on \emph{indirect observation}, the learner observes a derivative representation produced by multi-stage measurement, reconstruction, and preprocessing pipelines. \textbf{We argue that these measurement-to-dataset pipelines are inference components: treating their outputs as ``given data'' freezes an observation model and obscures uncertainty over feasible pipeline choices.} We identify three failure modes arising from this ``frozen lens'': \textbf{(C1) hidden hypothesis space}, where the released dataset does not specify the pipeline configuration or its validity conditions; \textbf{(C2) uncertified transportability}, where a pipeline may be documented but its regime of validity is untested, so failures under distribution shift cannot be adjudicated; \textbf{(C3) ungoverned multiplicity}, where many defensible pipelines exist and dispersion is real but not propagated into uncertainty-aware evidence. We stress-test these claims with a large-scale neuroscience empirical audit, finding a survival rate of $\approx 0.0004\%$ under a cross-dataset stability criterion. We call on the AI4Science community to make pipelines \emph{computable} inference objects via domain-specific Computable Observation Frameworks. This shift enables quantifying pipeline adequacy and stability, converting implicit implementation choices into auditable, reproducible, and cumulative scientific evidence.