CR AIFeb 26, 2025

Atlas: A Framework for ML Lifecycle Provenance & Transparency

Marcin Spoczynski, Marcela S. Melara, Sebastian Szyller

arXiv:2502.19567v214.612 citationsh-index: 9Has Code2025 IEEE European Symposium on Security and Privacy Workshops (EuroS&PW)

Originality Synthesis-oriented

AI Analysis

This addresses transparency and security issues for ML model vendors under regulatory pressure, though it appears incremental as it builds on existing open specifications and tools.

The authors tackled the problem of risks like data poisoning and supply chain attacks in ML pipelines by proposing Atlas, a framework for attestable ML pipelines that collects verifiable records of model authenticity and lineage metadata, with a prototype implementation tested through two case studies.

The rapid adoption of open source machine learning (ML) datasets and models exposes today's AI applications to critical risks like data poisoning and supply chain attacks across the ML lifecycle. With growing regulatory pressure to address these issues through greater transparency, ML model vendors face challenges balancing these requirements against confidentiality for data and intellectual property needs. We propose Atlas, a framework that enables fully attestable ML pipelines. Atlas leverages open specifications for data and software supply chain provenance to collect verifiable records of model artifact authenticity and end-to-end lineage metadata. Atlas combines trusted hardware and transparency logs to enhance metadata integrity, preserve data confidentiality, and limit unauthorized access during ML pipeline operations, from training through deployment. Our prototype implementation of Atlas integrates several open-source tools to build an ML lifecycle transparency system, and assess the practicality of Atlas through two case study ML pipelines.

View on arXiv PDF

Similar