LG CRMay 28, 2025

Machine Learning Models Have a Supply Chain Problem

Sarah Meiklejohn, Hayden Blauzvern, Mihai Maruseac, Spencer Schrock, Laurent Simon, Ilia Shumailov

DeepMind

arXiv:2505.22778v17.14 citationsh-index: 26Has CodeICML

Originality Synthesis-oriented

AI Analysis

This addresses security vulnerabilities for users of open ML models, though it is incremental by applying existing software supply-chain solutions to the ML domain.

The paper identifies significant supply-chain risks in the open machine learning model ecosystem, such as malicious replacements or training on poisoned data, and proposes using Sigstore to enable model signing and dataset verification for transparency.

Powerful machine learning (ML) models are now readily available online, which creates exciting possibilities for users who lack the deep technical expertise or substantial computing resources needed to develop them. On the other hand, this type of open ecosystem comes with many risks. In this paper, we argue that the current ecosystem for open ML models contains significant supply-chain risks, some of which have been exploited already in real attacks. These include an attacker replacing a model with something malicious (e.g., malware), or a model being trained using a vulnerable version of a framework or on restricted or poisoned data. We then explore how Sigstore, a solution designed to bring transparency to open-source software supply chains, can be used to bring transparency to open ML models, in terms of enabling model publishers to sign their models and prove properties about the datasets they use.

View on arXiv PDF

Similar