First to Possess His Statistics: Data-Free Model Extraction Attack on Tabular Data
This addresses security vulnerabilities in machine learning models for domains like medical diagnosis, though it is incremental as it builds on existing attack methods.
The paper tackles model extraction attacks on tabular data by introducing TEMPEST, a data-free method that uses publicly available statistics to generate query samples, achieving performance comparable to previous attacks without needing initial data.
Model extraction attacks are a kind of attacks where an adversary obtains a machine learning model whose performance is comparable with one of the victim model through queries and their results. This paper presents a novel model extraction attack, named TEMPEST, applicable on tabular data under a practical data-free setting. Whereas model extraction is more challenging on tabular data due to normalization, TEMPEST no longer needs initial samples that previous attacks require; instead, it makes use of publicly available statistics to generate query samples. Experiments show that our attack can achieve the same level of performance as the previous attacks. Moreover, we identify that the use of mean and variance as statistics for query generation and the use of the same normalization process as the victim model can improve the performance of our attack. We also discuss a possibility whereby TEMPEST is executed in the real world through an experiment with a medical diagnosis dataset. We plan to release the source code for reproducibility and a reference to subsequent works.