CRSep 30, 2022
Data Poisoning Attacks Against Multimodal EncodersZiqing Yang, Xinlei He, Zheng Li et al.
Recently, the newly emerged multimodal models, which leverage both visual and linguistic modalities to train powerful encoders, have gained increasing attention. However, learning from a large-scale unlabeled dataset also exposes the model to the risk of potential poisoning attacks, whereby the adversary aims to perturb the model's training data to trigger malicious behaviors in it. In contrast to previous work, only poisoning visual modality, in this work, we take the first step to studying poisoning attacks against multimodal models in both visual and linguistic modalities. Specially, we focus on answering two questions: (1) Is the linguistic modality also vulnerable to poisoning attacks? and (2) Which modality is most vulnerable? To answer the two questions, we propose three types of poisoning attacks against multimodal models. Extensive evaluations on different datasets and model architectures show that all three attacks can achieve significant attack performance while maintaining model utility in both visual and linguistic modalities. Furthermore, we observe that the poisoning effect differs between different modalities. To mitigate the attacks, we propose both pre-training and post-training defenses. We empirically show that both defenses can significantly reduce the attack performance while preserving the model's utility.
CRDec 18, 2022
Fine-Tuning Is All You Need to Mitigate Backdoor AttacksZeyang Sha, Xinlei He, Pascal Berrang et al.
Backdoor attacks represent one of the major threats to machine learning models. Various efforts have been made to mitigate backdoors. However, existing defenses have become increasingly complex and often require high computational resources or may also jeopardize models' utility. In this work, we show that fine-tuning, one of the most common and easy-to-adopt machine learning training operations, can effectively remove backdoors from machine learning models while maintaining high model utility. Extensive experiments over three machine learning paradigms show that fine-tuning and our newly proposed super-fine-tuning achieve strong defense performance. Furthermore, we coin a new term, namely backdoor sequela, to measure the changes in model vulnerabilities to other attacks before and after the backdoor has been removed. Empirical evaluation shows that, compared to other defense methods, super-fine-tuning leaves limited backdoor sequela. We hope our results can help machine learning model owners better protect their models from backdoor threats. Also, it calls for the design of more advanced attacks in order to comprehensively assess machine learning models' backdoor vulnerabilities.
CRMay 19
Evasion Under Blockchain SanctionsEndong Liu, Mark Ryan, Liyi Zhou et al.
Sanctioning blockchain addresses has become a common regulatory response to malicious activities. However, enforcement on permissionless blockchains remains challenging due to complex transaction flows and sophisticated fund-obfuscation techniques. Using cryptocurrency mixing tool Tornado Cash as a case study, we quantitatively assess the effectiveness of U.S. Office of Foreign Assets Control (OFAC) sanctions over a 957-day period, covering 6.79 million Ethereum blocks and 1.07 billion transactions. Our analysis reveals that while OFAC sanctions reduced overall Tornado Cash deposit volume by 71.03% to approximately 2 billion USD, attackers still relied on Tornado Cash in 78.33% of Ethereum-related security incidents, underscoring persistent evasion strategies. In this paper, we identify three significant, structural limitations in current sanction enforcement practices: (i) fragmented censorship in blockchain consensus and application layer; (ii) the complexity of obfuscation virtual asset services exploited by users; and (iii) the susceptibility of naive binary sanction classifications to dusting attacks. Our analysis and findings contribute to ongoing discussions around regulatory effectiveness in Decentralized Finance by providing empirical evidence, clarifying enforcement challenges, and informing future compliance strategies in response to sanctions and blockchain-based security risks.
LGMay 11
Beyond Red-Teaming: Formal Guarantees of LLM Guardrail ClassifiersNikita Kezins, Urbas Ekka, Pascal Berrang et al.
Guardrail Classifiers defend production language models against harmful behavior, but although results seem promising in testing, they provide no formal guarantees. Providing formal guarantees for such models is hard because "harmful behavior" has no natural specification in a discrete input space: and the standard epsilon-ball properties used in other domains do not carry semantic meaning. We close this gap by shifting verification from the discrete input space to the classifier's pre-activation space, where we define a harmful region as a convex shape enclosing the representations of known harmful prompts. Because the sigmoid classification head is monotonic, certifying the worst-case point is sufficient to certify the entire region, yielding a closed-form soundness proof without approximation in O(d) time. To formally evaluate these classifiers, we propose two constructions of such regions: SVD-aligned hyper-rectangles, which yield exact SAT/UNSAT certificates, and Gaussian Mixture Models, which yield probabilistic certificates over semantically coherent clusters. Applying this framework to three author-trained Guardrail Classifiers on the toxicity domain, every hyper-rectangle configuration returns SAT, exposing verifiable safety holes across all classifiers, despite seemingly high empirical metrics. Probabilistic GMM certificates also expose a divergent structural stability in how these models represent harm. While GPT-2 and Llama-3.1-8B maintain robust coverage of 90% and 80% across varying boundaries, BERT's safety guarantees prove uniquely volatile. This 'coverage collapse' to 55% at the optimal threshold reveals a sparsely populated safety margin in BERT, which only achieves full coverage by adopting an extremely conservative pessimistic threshold. These approaches combined, provide new insights on how effective Guardrail Classifiers really are, beyond traditional red-teaming.
CRMar 4, 2019Code
Albatross: An optimistic consensus algorithmPascal Berrang, Inês Cruz, Bruno França et al.
The consensus protocol is a critical component of distributed ledgers and blockchains. Achieving consensus over a decentralized network poses challenges to transaction finality and performance. Currently, the highest-performing consensus algorithms are speculative BFT algorithms, which, however, compromise on the transaction finality guarantees offered by their non-speculative counterparts. In this paper, we introduce Albatross, a Proof-of-Stake (PoS) blockchain consensus algorithm that aims to combine the best of both worlds. At its heart, Albatross is a high-performing, speculative BFT algorithm that offers strong probabilistic finality. We complement this by periodically guaranteeing finality through the Tendermint protocol. We prove our protocol to be secure under standard BFT assumptions and analyze its performance both on a theoretical and practical level. For that, we provide an open-source Rust implementation of Albatross. Our real-world measurements support that our protocol has a performance close to the theoretical maximum for single-chain Proof-of-Stake consensus algorithms.
CRMay 1
Zero-Knowledge Model CheckingPascal Berrang, Mirco Giacobbe, Jacob Swales et al.
We introduce a technology to formally verify that a software system satisfies a temporal specification of functional correctness, without revealing the system itself. Our method combines a deductive approach to model checking to obtain a formal certificate of correctness for the system, with zero-knowledge proofs to convince an external verifier that the system -- kept secret -- complies with its specification of correctness -- made public. We consider proof certificates represented as ranking functions, and introduce both an explicit-state and a symbolic scheme for model checking in zero knowledge. Our explicit-state scheme assumes systems represented as transition graphs. We use polynomial commitments to convince the verifier that the public proof certificates correspond to the secret transition relation. Our symbolic scheme assumes systems specified as linear guarded commands and uses piecewise-linear ranking functions. We apply Farkas' lemma to obtain a witness for the validity of the ranking function with public and secret components, and employ sigma protocols for matrix multiplication and range proofs to convince the verifier of the witness's existence. We built a prototype to demonstrate the practical efficacy of our two schemes on linear temporal logic verification examples. Our technology enables formal verification in domains where both the safety and the confidentiality of the system under analysis are critical.
CRMay 9, 2024
Link Stealing Attacks Against Inductive Graph Neural NetworksYixin Wu, Xinlei He, Pascal Berrang et al.
A graph neural network (GNN) is a type of neural network that is specifically designed to process graph-structured data. Typically, GNNs can be implemented in two settings, including the transductive setting and the inductive setting. In the transductive setting, the trained model can only predict the labels of nodes that were observed at the training time. In the inductive setting, the trained model can be generalized to new nodes/graphs. Due to its flexibility, the inductive setting is the most popular GNN setting at the moment. Previous work has shown that transductive GNNs are vulnerable to a series of privacy attacks. However, a comprehensive privacy analysis of inductive GNN models is still missing. This paper fills the gap by conducting a systematic privacy analysis of inductive GNNs through the lens of link stealing attacks, one of the most popular attacks that are specifically designed for GNNs. We propose two types of link stealing attacks, i.e., posterior-only attacks and combined attacks. We define threat models of the posterior-only attacks with respect to node topology and the combined attacks by considering combinations of posteriors, node attributes, and graph features. Extensive evaluation on six real-world datasets demonstrates that inductive GNNs leak rich information that enables link stealing attacks with advantageous properties. Even attacks with no knowledge about graph structures can be effective. We also show that our attacks are robust to different node similarities and different graph features. As a counterpart, we investigate two possible defenses and discover they are ineffective against our attacks, which calls for more effective defenses.
CRFeb 20, 2022
Accountable Javascript Code DeliveryIlkan Esiyok, Pascal Berrang, Katriel Cohn-Gordon et al.
The internet is a major distribution platform for web applications, but there are no effective transparency and audit mechanisms in place for the web. Due to the ephemeral nature of web applications, a client visiting a website has no guarantee that the code it receives today is the same as yesterday, or the same as other visitors receive. Despite advances in web security, it is thus challenging to audit web applications before they are rendered in the browser. We propose Accountable JS, a browser extension and opt in protocol for accountable delivery of active content on a web page. We prototype our protocol, formally model its security properties with the Tamarin Prover, and evaluate its compatibility and performance impact with case studies including WhatsApp Web, AdSense and Nimiq. Accountability is beginning to be deployed at scale, with Meta's recent announcement of Code Verify available to all 2 billion WhatsApp users, but there has been little formal analysis of such protocols. We formally model Code Verify using the Tamarin Prover and compare its properties to our Accountable JS protocol. We also compare Code Verify's and Accountable JS extension's performance impacts on WhatsApp Web.
CRJan 22, 2022
On How Zero-Knowledge Proof Blockchain Mixers Improve, and Worsen User PrivacyZhipeng Wang, Stefanos Chaliasos, Kaihua Qin et al.
Zero-knowledge proof (ZKP) mixers are one of the most widely-used blockchain privacy solutions, operating on top of smart contract-enabled blockchains. We find that ZKP mixers are tightly intertwined with the growing number of Decentralized Finance (DeFi) attacks and Blockchain Extractable Value (BEV) extractions. Through coin flow tracing, we discover that 205 blockchain attackers and 2,595 BEV extractors leverage mixers as their source of funds, while depositing a total attack revenue of 412.87M USD. Moreover, the US OFAC sanctions against the largest ZKP mixer, Tornado.Cash, have reduced the mixer's daily deposits by more than 80%. Further, ZKP mixers advertise their level of privacy through a so-called anonymity set size, which similarly to k-anonymity allows a user to hide among a set of k other users. Through empirical measurements, we, however, find that these anonymity set claims are mostly inaccurate. For the most popular mixers on Ethereum (ETH) and Binance Smart Chain (BSC), we show how to reduce the anonymity set size on average by 27.34% and 46.02% respectively. Our empirical evidence is also the first to suggest a differing privacy-predilection of users on ETH and BSC. State-of-the-art ZKP mixers are moreover interwoven with the DeFi ecosystem by offering anonymity mining (AM) incentives, i.e., users receive monetary rewards for mixing coins. However, contrary to the claims of related work, we find that AM does not necessarily improve the quality of a mixer's anonymity set. Our findings indicate that AM attracts privacy-ignorant users, who then do not contribute to improving the privacy of other mixer users.
CRJun 4, 2018
ML-Leaks: Model and Data Independent Membership Inference Attacks and Defenses on Machine Learning ModelsAhmed Salem, Yang Zhang, Mathias Humbert et al.
Machine learning (ML) has become a core component of many real-world applications and training data is a key factor that drives current progress. This huge success has led Internet companies to deploy machine learning as a service (MLaaS). Recently, the first membership inference attack has shown that extraction of information on the training set is possible in such MLaaS settings, which has severe security and privacy implications. However, the early demonstrations of the feasibility of such attacks have many assumptions on the adversary, such as using multiple so-called shadow models, knowledge of the target model structure, and having a dataset from the same distribution as the target model's training data. We relax all these key assumptions, thereby showing that such attacks are very broadly applicable at low cost and thereby pose a more severe risk than previously thought. We present the most comprehensive study so far on this emerging and developing threat using eight diverse datasets which show the viability of the proposed attacks across domains. In addition, we propose the first effective defense mechanisms against such broader class of membership inference attacks that maintain a high level of utility of the ML model.
CRFeb 11, 2015
From Closed-world Enforcement to Open-world Assessment of PrivacyMichael Backes, Pascal Berrang, Praveen Manoharan
In this paper, we develop a user-centric privacy framework for quantitatively assessing the exposure of personal information in open settings. Our formalization addresses key-challenges posed by such open settings, such as the unstructured dissemination of heterogeneous information and the necessity of user- and context-dependent privacy requirements. We propose a new definition of information sensitivity derived from our formalization of privacy requirements, and, as a sanity check, show that hard non-disclosure guarantees are impossible to achieve in open settings. After that, we provide an instantiation of our framework to address the identity disclosure problem, leading to the novel notion of d-convergence. d-convergence is based on indistinguishability of entities and it bounds the likelihood with which an adversary successfully links two profiles of the same user across online communities. Finally, we provide a large-scale evaluation of our framework on a collection of 15 million comments collected from the Online Social Network Reddit. Our evaluation validates the notion of d-convergence for assessing the linkability of entities in our data set and provides deeper insights into the data set's structure.