LGAug 20, 2025

CaTE Data Curation for Trustworthy AI

Mary Versa Clemens-Sewall, Christopher Cervantes, Emma Rafkin, J. Neil Otte, Tom Magelinski, Libby Lewis, Michelle Liu, Dana Udwin, Monique Kirkman-Bey

arXiv:2508.14741v14.1h-index: 1Has Code

Originality Synthesis-oriented

AI Analysis

It addresses the problem of building trustworthy AI systems for developers and data scientists, but is incremental as it compiles existing methods.

This report provides practical guidance for teams on how to promote trustworthiness during the data curation phase of AI system development, synthesizing tools and approaches from academic literature to offer a coherent set of practices.

This report provides practical guidance to teams designing or developing AI-enabled systems for how to promote trustworthiness during the data curation phase of development. In this report, the authors first define data, the data curation phase, and trustworthiness. We then describe a series of steps that the development team, especially data scientists, can take to build a trustworthy AI-enabled system. We enumerate the sequence of core steps and trace parallel paths where alternatives exist. The descriptions of these steps include strengths, weaknesses, preconditions, outcomes, and relevant open-source software tool implementations. In total, this report is a synthesis of data curation tools and approaches from relevant academic literature, and our goal is to equip readers with a diverse yet coherent set of practices for improving AI trustworthiness.

View on arXiv PDF

Similar