CRDec 13, 2018

Training Set Camouflage

Ayon Sen, Scott Alfeld, Xuezhou Zhang, Ara Vartanian, Yuzhe Ma, Xiaojin Zhu

arXiv:1812.05725v15.82 citations

Originality Incremental advance

AI Analysis

This addresses the problem of secure communication in machine learning for users needing to hide illicit training data, though it is incremental as it applies steganography to ML.

The authors tackled the problem of covertly training a classifier on an illicit task by introducing training set camouflage, a steganographic method that transforms the original training set into a benign-looking one, enabling Bob to recover the original classifier with standard learning algorithms.

We introduce a form of steganography in the domain of machine learning which we call training set camouflage. Imagine Alice has a training set on an illicit machine learning classification task. Alice wants Bob (a machine learning system) to learn the task. However, sending either the training set or the trained model to Bob can raise suspicion if the communication is monitored. Training set camouflage allows Alice to compute a second training set on a completely different -- and seemingly benign -- classification task. By construction, sending the second training set will not raise suspicion. When Bob applies his standard (public) learning algorithm to the second training set, he approximately recovers the classifier on the original task. Training set camouflage is a novel form of steganography in machine learning. We formulate training set camouflage as a combinatorial bilevel optimization problem and propose solvers based on nonlinear programming and local search. Experiments on real classification tasks demonstrate the feasibility of such camouflage.

View on arXiv PDF

Similar