LGAIMEJun 18, 2025

PCS Workflow for Veridical Data Science in the Age of AI

arXiv:2508.00835v17 citationsh-index: 2
Originality Synthesis-oriented
AI Analysis

This provides a principled framework for practitioners to improve reproducibility in AI and data science, though it is incremental as an update to an existing method.

The paper tackles the problem of non-replicable data science findings by introducing an updated Predictability-Computability-Stability (PCS) workflow to address uncertainty from choices in the data science life cycle, with a case study showing how data cleaning judgments affect prediction uncertainty.

Data science is a pillar of artificial intelligence (AI), which is transforming nearly every domain of human activity, from the social and physical sciences to engineering and medicine. While data-driven findings in AI offer unprecedented power to extract insights and guide decision-making, many are difficult or impossible to replicate. A key reason for this challenge is the uncertainty introduced by the many choices made throughout the data science life cycle (DSLC). Traditional statistical frameworks often fail to account for this uncertainty. The Predictability-Computability-Stability (PCS) framework for veridical (truthful) data science offers a principled approach to addressing this challenge throughout the DSLC. This paper presents an updated and streamlined PCS workflow, tailored for practitioners and enhanced with guided use of generative AI. We include a running example to display the PCS framework in action, and conduct a related case study which showcases the uncertainty in downstream predictions caused by judgment calls in the data cleaning stage.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes