Abigail Goldsteen

LG
h-index9
8papers
117citations
Novelty40%
AI Score25

8 Papers

LGOct 11, 2023
Improved Membership Inference Attacks Against Language Classification Models

Shlomit Shachor, Natalia Razinkov, Abigail Goldsteen

Artificial intelligence systems are prevalent in everyday life, with use cases in retail, manufacturing, health, and many other fields. With the rise in AI adoption, associated risks have been identified, including privacy risks to the people whose data was used to train models. Assessing the privacy risks of machine learning models is crucial to enabling knowledgeable decisions on whether to use, deploy, or share a model. A common approach to privacy risk assessment is to run one or more known attacks against the model and measure their success rate. We present a novel framework for running membership inference attacks against classification models. Our framework takes advantage of the ensemble method, generating many specialized attack models for different subsets of the data. We show that this approach achieves higher accuracy than either a single attack model or an attack model per class label, both on classical and language classification tasks.

LGJul 3, 2024
Membership Inference Attacks Against Time-Series Models

Noam Koren, Abigail Goldsteen, Guy Amit et al.

Analyzing time-series data that contains personal information, particularly in the medical field, presents serious privacy concerns. Sensitive health data from patients is often used to train machine learning models for diagnostics and ongoing care. Assessing the privacy risk of such models is crucial to making knowledgeable decisions on whether to use a model in production or share it with third parties. Membership Inference Attacks (MIA) are a key method for this kind of evaluation, however time-series prediction models have not been thoroughly studied in this context. We explore existing MIA techniques on time-series models, and introduce new features, focusing on the seasonality and trend components of the data. Seasonality is estimated using a multivariate Fourier transform, and a low-degree polynomial is used to approximate trends. We applied these techniques to various types of time-series models, using datasets from the health domain. Our results demonstrate that these new features enhance the effectiveness of MIAs in identifying membership, improving the understanding of privacy risks in medical data applications.

LGMar 13, 2024
SoK: Reducing the Vulnerability of Fine-tuned Language Models to Membership Inference Attacks

Guy Amit, Abigail Goldsteen, Ariel Farkash

Natural language processing models have experienced a significant upsurge in recent years, with numerous applications being built upon them. Many of these applications require fine-tuning generic base models on customized, proprietary datasets. This fine-tuning data is especially likely to contain personal or sensitive information about individuals, resulting in increased privacy risk. Membership inference attacks are the most commonly employed attack to assess the privacy leakage of a machine learning model. However, limited research is available on the factors that affect the vulnerability of language models to this kind of attack, or on the applicability of different defense strategies in the language domain. We provide the first systematic review of the vulnerability of fine-tuned large language models to membership inference attacks, the various factors that come into play, and the effectiveness of different defense strategies. We find that some training methods provide significantly reduced privacy risk, with the combination of differential privacy and low-rank adaptors achieving the best privacy protection against these attacks.

LGAug 6, 2020
Data Minimization for GDPR Compliance in Machine Learning Models

Abigail Goldsteen, Gilad Ezov, Ron Shmelkin et al.

The EU General Data Protection Regulation (GDPR) mandates the principle of data minimization, which requires that only data necessary to fulfill a certain purpose be collected. However, it can often be difficult to determine the minimal amount of data required, especially in complex machine learning models such as neural networks. We present a first-of-a-kind method to reduce the amount of personal data needed to perform predictions with a machine learning model, by removing or generalizing some of the input features. Our method makes use of the knowledge encoded within the model to produce a generalization that has little to no impact on its accuracy. This enables the creators and users of machine learning models to acheive data minimization, in a provable manner.

CRJul 26, 2020
Anonymizing Machine Learning Models

Abigail Goldsteen, Gilad Ezov, Ron Shmelkin et al.

There is a known tension between the need to analyze personal data to drive business and privacy concerns. Many data protection regulations, including the EU General Data Protection Regulation (GDPR) and the California Consumer Protection Act (CCPA), set out strict restrictions and obligations on the collection and processing of personal data. Moreover, machine learning models themselves can be used to derive personal information, as demonstrated by recent membership and attribute inference attacks. Anonymized data, however, is exempt from the obligations set out in these regulations. It is therefore desirable to be able to create models that are anonymized, thus also exempting them from those obligations, in addition to providing better protection against attacks. Learning on anonymized data typically results in significant degradation in accuracy. In this work, we propose a method that is able to achieve better model accuracy by using the knowledge encoded within the trained model, and guiding our anonymization process to minimize the impact on the model's accuracy, a process we call accuracy-guided anonymization. We demonstrate that by focusing on the model's accuracy rather than generic information loss measures, our method outperforms state of the art k-anonymity methods in terms of the achieved utility, in particular with high values of k and large numbers of quasi-identifiers. We also demonstrate that our approach has a similar, and sometimes even better ability to prevent membership inference attacks as approaches based on differential privacy, while averting some of their drawbacks such as complexity, performance overhead and model-specific implementations. This makes model-guided anonymization a legitimate substitute for such methods and a practical approach to creating privacy-preserving models.

LGJun 29, 2020
Reducing Risk of Model Inversion Using Privacy-Guided Training

Abigail Goldsteen, Gilad Ezov, Ariel Farkash

Machine learning models often pose a threat to the privacy of individuals whose data is part of the training set. Several recent attacks have been able to infer sensitive information from trained models, including model inversion or attribute inference attacks. These attacks are able to reveal the values of certain sensitive features of individuals who participated in training the model. It has also been shown that several factors can contribute to an increased risk of model inversion, including feature influence. We observe that not all features necessarily share the same level of privacy or sensitivity. In many cases, certain features used to train a model are considered especially sensitive and therefore propitious candidates for inversion. We present a solution for countering model inversion attacks in tree-based models, by reducing the influence of sensitive features in these models. This is an avenue that has not yet been thoroughly investigated, with only very nascent previous attempts at using this as a countermeasure against attribute inference. Our work shows that, in many cases, it is possible to train a model in different ways, resulting in different influence levels of the various features, without necessarily harming the model's accuracy. We are able to utilize this fact to train models in a manner that reduces the model's reliance on the most sensitive features, while increasing the importance of less sensitive features. Our evaluation confirms that training models in this manner reduces the risk of inference for those features, as demonstrated through several black-box and white-box attacks.

CROct 30, 2019
Forgotten @ Scale: A Practical Solution for Implementing the Right To Be Forgotten in Large-Scale Systems

Abigail Goldsteen, Tomer Douek, Yaniv Cohen et al.

The European General Data Protection Regulation asserts data subjects' right to be forgotten, i.e., their right to request that all their personal data be deleted from an organizations' data stores. However, fulfilling such requests in large-scale systems is technically challenging. It requires that organizations keep track of all locations in which an individual's data is stored, be able to access and delete it in a reasonable time frame, and be able to prove that all such data was in fact deleted. In addition, organizations must cope with complexities such as multiple, distributed, and continuously evolving systems of record, complex data retention policies and deletion approval workflows. We present a first design pattern and practical implementation of the right to be forgotten on a large scale in Big Data and cloud environments.

CRJun 22, 2015
Proceedings of the Ninth Workshop on Web 2.0 Security and Privacy (W2SP) 2015

Abigail Goldsteen, Tyrone Grandison, Mike Just et al.

This is the Proceedings of the Ninth Workshop on Web 2.0 Security and Privacy (W2SP) 2015, held in San Jose, CA, USA, on May 21, 2015. The workshop was held as part of the IEEE Computer Society Security and Privacy Workshops, in conjunction with the IEEE Symposium on Security and Privacy.