Dancing in the Dark: Private Multi-Party Machine Learning in an Untrusted Setting
This addresses privacy concerns for data providers in distributed ML systems, though it builds incrementally on federated learning.
The paper tackles the problem of enabling private multi-party machine learning in untrusted settings by proposing a brokered learning abstraction, resulting in TorMentor, which trains a logistic regression model with 200 clients in 65 seconds while offering tunable privacy-accuracy trade-offs.
Distributed machine learning (ML) systems today use an unsophisticated threat model: data sources must trust a central ML process. We propose a brokered learning abstraction that allows data sources to contribute towards a globally-shared model with provable privacy guarantees in an untrusted setting. We realize this abstraction by building on federated learning, the state of the art in multi-party ML, to construct TorMentor: an anonymous hidden service that supports private multi-party ML. We define a new threat model by characterizing, developing and evaluating new attacks in the brokered learning setting, along with new defenses for these attacks. We show that TorMentor effectively protects data providers against known ML attacks while providing them with a tunable trade-off between model accuracy and privacy. We evaluate TorMentor with local and geo-distributed deployments on Azure/Tor. In an experiment with 200 clients and 14 MB of data per client, our prototype trained a logistic regression model using stochastic gradient descent in 65s. Code is available at: https://github.com/DistributedML/TorML