CRLGOct 25, 2019

Substra: a framework for privacy-preserving, traceable and collaborative Machine Learning

arXiv:1910.11567v153 citations
Originality Incremental advance
AI Analysis

This addresses privacy and trust issues for organizations handling sensitive data, such as in healthcare, though it is an incremental extension of federated learning concepts.

The paper tackles the problem of privacy concerns in machine learning by introducing Substra, a distributed framework that enables collaborative training without sharing sensitive data, using distributed learning and a distributed ledger for traceability.

Machine learning is promising, but it often needs to process vast amounts of sensitive data which raises concerns about privacy. In this white-paper, we introduce Substra, a distributed framework for privacy-preserving, traceable and collaborative Machine Learning. Substra gathers data providers and algorithm designers into a network of nodes that can train models on demand but under advanced permission regimes. To guarantee data privacy, Substra implements distributed learning: the data never leave their nodes; only algorithms, predictive models and non-sensitive metadata are exchanged on the network. The computations are orchestrated by a Distributed Ledger Technology which guarantees traceability and authenticity of information without needing to trust a third party. Although originally developed for Healthcare applications, Substra is not data, algorithm or programming language specific. It supports many types of computation plans including parallel computation plan commonly used in Federated Learning. With appropriate guidelines, it can be deployed for numerous Machine Learning use-cases with data or algorithm providers where trust is limited.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes