LGNov 2, 2023Code
DP-Mix: Mixup-based Data Augmentation for Differentially Private LearningWenxuan Bao, Francesco Pittaluga, Vijay Kumar B G et al.
Data augmentation techniques, such as simple image transformations and combinations, are highly effective at improving the generalization of computer vision models, especially when training data is limited. However, such techniques are fundamentally incompatible with differentially private learning approaches, due to the latter's built-in assumption that each training image's contribution to the learned model is bounded. In this paper, we investigate why naive applications of multi-sample data augmentation techniques, such as mixup, fail to achieve good performance and propose two novel data augmentation techniques specifically designed for the constraints of differentially private learning. Our first technique, DP-Mix_Self, achieves SoTA classification performance across a range of datasets and settings by performing mixup on self-augmented data. Our second technique, DP-Mix_Diff, further improves performance by incorporating synthetic data from a pre-trained diffusion model into the mixup process. We open-source the code at https://github.com/wenxuan-Bao/DP-Mix.
CLOct 24, 2023
SoK: Memorization in General-Purpose Large Language ModelsValentin Hartmann, Anshuman Suri, Vincent Bindschaedler et al.
Large Language Models (LLMs) are advancing at a remarkable pace, with myriad applications under development. Unlike most earlier machine learning models, they are no longer built for one specific application but are designed to excel in a wide range of tasks. A major part of this success is due to their huge training datasets and the unprecedented number of model parameters, which allow them to memorize large amounts of information contained in the training data. This memorization goes beyond mere language, and encompasses information only present in a few documents. This is often desirable since it is necessary for performing tasks such as question answering, and therefore an important part of learning, but also brings a whole array of issues, from privacy and security to copyright and beyond. LLMs can memorize short secrets in the training data, but can also memorize concepts like facts or writing styles that can be expressed in text in many different ways. We propose a taxonomy for memorization in LLMs that covers verbatim text, facts, ideas and algorithms, writing styles, distributional properties, and alignment goals. We describe the implications of each type of memorization - both positive and negative - for model performance, privacy, security and confidentiality, copyright, and auditing, and ways to detect and prevent memorization. We further highlight the challenges that arise from the predominant way of defining memorization with respect to model behavior instead of model weights, due to LLM-specific phenomena such as reasoning capabilities or differences between decoding algorithms. Throughout the paper, we describe potential risks and opportunities arising from memorization in LLMs that we hope will motivate new research directions.
CRMar 10, 2022
Attacks as Defenses: Designing Robust Audio CAPTCHAs Using Attacks on Automatic Speech Recognition SystemsHadi Abdullah, Aditya Karlekar, Saurabh Prasad et al.
Audio CAPTCHAs are supposed to provide a strong defense for online resources; however, advances in speech-to-text mechanisms have rendered these defenses ineffective. Audio CAPTCHAs cannot simply be abandoned, as they are specifically named by the W3C as important enablers of accessibility. Accordingly, demonstrably more robust audio CAPTCHAs are important to the future of a secure and accessible Web. We look to recent literature on attacks on speech-to-text systems for inspiration for the construction of robust, principle-driven audio defenses. We begin by comparing 20 recent attack papers, classifying and measuring their suitability to serve as the basis of new "robust to transcription" but "easy for humans to understand" CAPTCHAs. After showing that none of these attacks alone are sufficient, we propose a new mechanism that is both comparatively intelligible (evaluated through a user study) and hard to automatically transcribe (i.e., $P({\rm transcription}) = 4 \times 10^{-5}$). Finally, we demonstrate that our audio samples have a high probability of being detected as CAPTCHAs when given to speech-to-text systems ($P({\rm evasion}) = 1.77 \times 10^{-4}$). In so doing, we not only demonstrate a CAPTCHA that is approximately four orders of magnitude more difficult to crack, but that such systems can be designed based on the insights gained from attack papers using the differences between the ways that humans and computers process audio.
CRMay 13, 2022
On the Importance of Architecture and Feature Selection in Differentially Private Machine LearningWenxuan Bao, Luke A. Bauer, Vincent Bindschaedler
We study a pitfall in the typical workflow for differentially private machine learning. The use of differentially private learning algorithms in a "drop-in" fashion -- without accounting for the impact of differential privacy (DP) noise when choosing what feature engineering operations to use, what features to select, or what neural network architecture to use -- yields overly complex and poorly performing models. In other words, by anticipating the impact of DP noise, a simpler and more accurate alternative model could have been trained for the same privacy guarantee. We systematically study this phenomenon through theory and experiments. On the theory front, we provide an explanatory framework and prove that the phenomenon arises naturally from the addition of noise to satisfy differential privacy. On the experimental front, we demonstrate how the phenomenon manifests in practice using various datasets, types of models, tasks, and neural network architectures. We also analyze the factors that contribute to the problem and distill our experimental insights into concrete takeaways that practitioners can follow when training models with differential privacy. Finally, we propose privacy-aware algorithms for feature selection and neural network architecture search. We analyze their differential privacy properties and evaluate them empirically.
CYApr 14
Characterizing Resource Sharing Practices on Underground Internet Forum Synthetic Non-Consensual Intimate Image Content Creation CommunitiesBernardo B. P. Medeiros, Malvika Jadhav, Allison Lu et al.
Many malicious actors responsible for disseminating synthetic non-consensual intimate imagery (SNCII) operate within internet forums to exchange resources, strategies, and generated content across multiple platforms. Technically-sophisticated actors gravitate toward certain communities (e.g., 4chan), while lower-sophistication end-users are more active on others (e.g., Reddit). To characterize key stakeholders in the broader ecosystem, we perform an integrated analysis of multiple communities, analyzing 282,154 4chan comments and 78,308 Reddit submissions spanning 165 days between June and November 2025 to characterize involved actors, actions, and resources. We find: (a) that users with differing levels of technical sophistication employ and share a wide range of primary resources facilitating SNCII content creation as well as numerous secondary resources facilitating dissemination; and (b) that knowledge transfer between experts and newcomers facilitates propagation of these illicit resources. Based on our empirical analysis, we identify gaps in existing SNCII regulatory infrastructure and synthesize several critical intervention points for bolstering deterrence.
CVOct 8, 2025
Cluster Paths: Navigating Interpretability in Neural NetworksNicholas M. Kroeger, Vincent Bindschaedler
While modern deep neural networks achieve impressive performance in vision tasks, they remain opaque in their decision processes, risking unwarranted trust, undetected biases and unexpected failures. We propose cluster paths, a post-hoc interpretability method that clusters activations at selected layers and represents each input as its sequence of cluster IDs. To assess these cluster paths, we introduce four metrics: path complexity (cognitive load), weighted-path purity (class alignment), decision-alignment faithfulness (predictive fidelity), and path agreement (stability under perturbations). In a spurious-cue CIFAR-10 experiment, cluster paths identify color-based shortcuts and collapse when the cue is removed. On a five-class CelebA hair-color task, they achieve 90% faithfulness and maintain 96% agreement under Gaussian noise without sacrificing accuracy. Scaling to a Vision Transformer pretrained on ImageNet, we extend cluster paths to concept paths derived from prompting a large language model on minimal path divergences. Finally, we show that cluster paths can serve as an effective out-of-distribution (OOD) detector, reliably flagging anomalous samples before the model generates over-confident predictions. Cluster paths uncover visual concepts, such as color palettes, textures, or object contexts, at multiple network depths, demonstrating that cluster paths scale to large vision models while generating concise and human-readable explanations.
LGAug 21, 2025
Towards Reliable and Generalizable Differentially Private Machine Learning (Extended Version)Wenxuan Bao, Vincent Bindschaedler
There is a flurry of recent research papers proposing novel differentially private machine learning (DPML) techniques. These papers claim to achieve new state-of-the-art (SoTA) results and offer empirical results as validation. However, there is no consensus on which techniques are most effective or if they genuinely meet their stated claims. Complicating matters, heterogeneity in codebases, datasets, methodologies, and model architectures make direct comparisons of different approaches challenging. In this paper, we conduct a reproducibility and replicability (R+R) experiment on 11 different SoTA DPML techniques from the recent research literature. Results of our investigation are varied: while some methods stand up to scrutiny, others falter when tested outside their initial experimental conditions. We also discuss challenges unique to the reproducibility of DPML, including additional randomness due to DP noise, and how to address them. Finally, we derive insights and best practices to obtain scientifically valid and reliable results.
LGNov 18, 2021
Enhanced Membership Inference Attacks against Machine Learning ModelsJiayuan Ye, Aadyaa Maddi, Sasi Kumar Murakonda et al.
How much does a machine learning algorithm leak about its training data, and why? Membership inference attacks are used as an auditing tool to quantify this leakage. In this paper, we present a comprehensive \textit{hypothesis testing framework} that enables us not only to formally express the prior work in a consistent way, but also to design new membership inference attacks that use reference models to achieve a significantly higher power (true positive rate) for any (false positive rate) error. More importantly, we explain \textit{why} different attacks perform differently. We present a template for indistinguishability games, and provide an interpretation of attack success rate across different instances of the game. We discuss various uncertainties of attackers that arise from the formulation of the problem, and show how our approach tries to minimize the attack uncertainty to the one bit secret about the presence or absence of a data point in the training set. We perform a \textit{differential analysis} between all types of attacks, explain the gap between them, and show what causes data points to be vulnerable to an attack (as the reasons vary due to different granularities of memorization, from overfitting to conditional memorization). Our auditing framework is openly accessible as part of the \textit{Privacy Meter} software tool.
CROct 25, 2021
Beyond $L_p$ clipping: Equalization-based Psychoacoustic Attacks against ASRsHadi Abdullah, Muhammad Sajidur Rahman, Christian Peeters et al.
Automatic Speech Recognition (ASR) systems convert speech into text and can be placed into two broad categories: traditional and fully end-to-end. Both types have been shown to be vulnerable to adversarial audio examples that sound benign to the human ear but force the ASR to produce malicious transcriptions. Of these attacks, only the "psychoacoustic" attacks can create examples with relatively imperceptible perturbations, as they leverage the knowledge of the human auditory system. Unfortunately, existing psychoacoustic attacks can only be applied against traditional models, and are obsolete against the newer, fully end-to-end ASRs. In this paper, we propose an equalization-based psychoacoustic attack that can exploit both traditional and fully end-to-end ASRs. We successfully demonstrate our attack against real-world ASRs that include DeepSpeech and Wav2Letter. Moreover, we employ a user study to verify that our method creates low audible distortion. Specifically, 80 of the 100 participants voted in favor of all our attack audio samples as less noisier than the existing state-of-the-art attack. Through this, we demonstrate both types of existing ASR pipelines can be exploited with minimum degradation to attack audio quality.
CROct 13, 2021
Leveraging Generative Models for Covert Messaging: Challenges and Tradeoffs for "Dead-Drop" DeploymentsLuke A. Bauer, James K. Howes, Sam A. Markelon et al.
State of the art generative models of human-produced content are the focus of many recent papers that explore their use for steganographic communication. In particular, generative models of natural language text. Loosely, these works (invertibly) encode message-carrying bits into a sequence of samples from the model, ultimately yielding a plausible natural language covertext. By focusing on this narrow steganographic piece, prior work has largely ignored the significant algorithmic challenges, and performance-security tradeoffs, that arise when one actually tries to build a messaging pipeline around it. We make these challenges concrete, by considering the natural application of such a pipeline: namely, "dead-drop" covert messaging over large, public internet platforms (e.g. social media sites). We explicate the challenges and describe approaches to overcome them, surfacing in the process important performance and security tradeoffs that must be carefully tuned. We implement a system around this model-based format-transforming encryption pipeline, and give an empirical analysis of its performance and (heuristic) security.
CRJul 21, 2021
Generative Models for Security: Attacks, Defenses, and OpportunitiesLuke A. Bauer, Vincent Bindschaedler
Generative models learn the distribution of data from a sample dataset and can then generate new data instances. Recent advances in deep learning has brought forth improvements in generative model architectures, and some state-of-the-art models can (in some cases) produce outputs realistic enough to fool humans. We survey recent research at the intersection of security and privacy and generative models. In particular, we discuss the use of generative models in adversarial machine learning, in helping automate or enhance existing attacks, and as building blocks for defenses in contexts such as intrusion detection, biometrics spoofing, and malware obfuscation. We also describe the use of generative models in diverse applications such as fairness in machine learning, privacy-preserving data synthesis, and steganography. Finally, we discuss new threats due to generative models: the creation of synthetic media such as deepfakes that can be used for disinformation.
CRJul 13, 2020
SoK: The Faults in our ASRs: An Overview of Attacks against Automatic Speech Recognition and Speaker Identification SystemsHadi Abdullah, Kevin Warren, Vincent Bindschaedler et al.
Speech and speaker recognition systems are employed in a variety of applications, from personal assistants to telephony surveillance and biometric authentication. The wide deployment of these systems has been made possible by the improved accuracy in neural networks. Like other systems based on neural networks, recent research has demonstrated that speech and speaker recognition systems are vulnerable to attacks using manipulated inputs. However, as we demonstrate in this paper, the end-to-end architecture of speech and speaker systems and the nature of their inputs make attacks and defenses against them substantially different than those in the image space. We demonstrate this first by systematizing existing research in this space and providing a taxonomy through which the community can evaluate future work. We then demonstrate experimentally that attacks against these models almost universally fail to transfer. In so doing, we argue that substantial additional work is required to provide adequate mitigations in this space.
CRFeb 13, 2018
Understanding Membership Inferences on Well-Generalized Learning ModelsYunhui Long, Vincent Bindschaedler, Lei Wang et al.
Membership Inference Attack (MIA) determines the presence of a record in a machine learning model's training data by querying the model. Prior work has shown that the attack is feasible when the model is overfitted to its training data or when the adversary controls the training algorithm. However, when the model is not overfitted and the adversary does not control the training algorithm, the threat is not well understood. In this paper, we report a study that discovers overfitting to be a sufficient but not a necessary condition for an MIA to succeed. More specifically, we demonstrate that even a well-generalized model contains vulnerable instances subject to a new generalized MIA (GMIA). In GMIA, we use novel techniques for selecting vulnerable instances and detecting their subtle influences ignored by overfitting metrics. Specifically, we successfully identify individual records with high precision in real-world datasets by querying black-box machine learning models. Further we show that a vulnerable record can even be indirectly attacked by querying other related records and existing generalization techniques are found to be less effective in protecting the vulnerable instances. Our findings sharpen the understanding of the fundamental cause of the problem: the unique influences the training instance may have on the model.
CRDec 25, 2017
Towards Measuring Membership PrivacyYunhui Long, Vincent Bindschaedler, Carl A. Gunter
Machine learning models are increasingly made available to the masses through public query interfaces. Recent academic work has demonstrated that malicious users who can query such models are able to infer sensitive information about records within the training data. Differential privacy can thwart such attacks, but not all models can be readily trained to achieve this guarantee or to achieve it with acceptable utility loss. As a result, if a model is trained without differential privacy guarantee, little is known or can be said about the privacy risk of releasing it. In this work, we investigate and analyze membership attacks to understand why and how they succeed. Based on this understanding, we propose Differential Training Privacy (DTP), an empirical metric to estimate the privacy risk of publishing a classier when methods such as differential privacy cannot be applied. DTP is a measure of a classier with respect to its training dataset, and we show that calculating DTP is efficient in many practical cases. We empirically validate DTP using state-of-the-art machine learning models such as neural networks trained on real-world datasets. Our results show that DTP is highly predictive of the success of membership attacks and therefore reducing DTP also reduces the privacy risk. We advocate for DTP to be used as part of the decision-making process when considering publishing a classifier. To this end, we also suggest adopting the DTP-1 hypothesis: if a classifier has a DTP value above 1, it should not be published.
CRAug 26, 2017
Plausible Deniability for Privacy-Preserving Data SynthesisVincent Bindschaedler, Reza Shokri, Carl A. Gunter
Releasing full data records is one of the most challenging problems in data privacy. On the one hand, many of the popular techniques such as data de-identification are problematic because of their dependence on the background knowledge of adversaries. On the other hand, rigorous methods such as the exponential mechanism for differential privacy are often computationally impractical to use for releasing high dimensional data or cannot preserve high utility of original data due to their extensive data perturbation. This paper presents a criterion called plausible deniability that provides a formal privacy guarantee, notably for releasing sensitive datasets: an output record can be released only if a certain amount of input records are indistinguishable, up to a privacy parameter. This notion does not depend on the background knowledge of an adversary. Also, it can efficiently be checked by privacy tests. We present mechanisms to generate synthetic datasets with similar statistical properties to the input data and the same format. We study this technique both theoretically and experimentally. A key theoretical result shows that, with proper randomization, the plausible deniability mechanism generates differentially private synthetic data. We demonstrate the efficiency of this generative technique on a large dataset; it is shown to preserve the utility of original data with respect to various statistical analysis and machine learning measures.
CRMay 20, 2017
Leaky Cauldron on the Dark Land: Understanding Memory Side-Channel Hazards in SGXWenhao Wang, Guoxing Chen, Xiaorui Pan et al.
Side-channel risks of Intel's SGX have recently attracted great attention. Under the spotlight is the newly discovered page-fault attack, in which an OS-level adversary induces page faults to observe the page-level access patterns of a protected process running in an SGX enclave. With almost all proposed defense focusing on this attack, little is known about whether such efforts indeed raise the bar for the adversary, whether a simple variation of the attack renders all protection ineffective, not to mention an in-depth understanding of other attack surfaces in the SGX system. In the paper, we report the first step toward systematic analyses of side-channel threats that SGX faces, focusing on the risks associated with its memory management. Our research identifies 8 potential attack vectors, ranging from TLB to DRAM modules. More importantly, we highlight the common misunderstandings about SGX memory side channels, demonstrating that high frequent AEXs can be avoided when recovering EdDSA secret key through a new page channel and fine-grained monitoring of enclave programs (at the level of 64B) can be done through combining both cache and cross-enclave DRAM channels. Our findings reveal the gap between the ongoing security research on SGX and its side-channel weaknesses, redefine the side-channel threat model for secure enclaves, and can provoke a discussion on when to use such a system and how to use it securely.
CRMay 27, 2015
Privacy through Fake yet Semantically Real TracesVincent Bindschaedler, Reza Shokri
Camouflaging data by generating fake information is a well-known obfuscation technique for protecting data privacy. In this paper, we focus on a very sensitive and increasingly exposed type of data: location data. There are two main scenarios in which fake traces are of extreme value to preserve location privacy: publishing datasets of location trajectories, and using location-based services. Despite advances in protecting (location) data privacy, there is no quantitative method to evaluate how realistic a synthetic trace is, and how much utility and privacy it provides in each scenario. Also, the lack of a methodology to generate privacy-preserving fake traces is evident. In this paper, we fill this gap and propose the first statistical metric and model to generate fake location traces such that both the utility of data and the privacy of users are preserved. We build upon the fact that, although geographically they visit distinct locations, people have strongly semantically similar mobility patterns, for example, their transition pattern across activities (e.g., working, driving, staying at home) is similar. We define a statistical metric and propose an algorithm that automatically discovers the hidden semantic similarities between locations from a bag of real location traces as seeds, without requiring any initial semantic annotations. We guarantee that fake traces are geographically dissimilar to their seeds, so they do not leak sensitive location information. We also protect contributors to seed traces against membership attacks. Interleaving fake traces with mobile users' traces is a prominent location privacy defense mechanism. We quantitatively show the effectiveness of our methodology in protecting against localization inference attacks while preserving utility of sharing/publishing traces.