LGApr 20, 2022
Adversarial Scratches: Deployable Attacks to CNN ClassifiersLoris Giulivi, Malhar Jere, Loris Rossi et al.
A growing body of work has shown that deep neural networks are susceptible to adversarial examples. These take the form of small perturbations applied to the model's input which lead to incorrect predictions. Unfortunately, most literature focuses on visually imperceivable perturbations to be applied to digital images that often are, by design, impossible to be deployed to physical targets. We present Adversarial Scratches: a novel L0 black-box attack, which takes the form of scratches in images, and which possesses much greater deployability than other state-of-the-art attacks. Adversarial Scratches leverage Bézier Curves to reduce the dimension of the search space and possibly constrain the attack to a specific location. We test Adversarial Scratches in several scenarios, including a publicly available API and images of traffic signs. Results show that, often, our attack achieves higher fooling rate than other deployable state-of-the-art methods, while requiring significantly fewer queries and modifying very few pixels.
CLJun 2, 2023
PassGPT: Password Modeling and (Guided) Generation with Large Language ModelsJavier Rando, Fernando Perez-Cruz, Briland Hitaj · eth-zurich
Large language models (LLMs) successfully model natural language from vast amounts of text without the need for explicit supervision. In this paper, we investigate the efficacy of LLMs in modeling passwords. We present PassGPT, a LLM trained on password leaks for password generation. PassGPT outperforms existing methods based on generative adversarial networks (GAN) by guessing twice as many previously unseen passwords. Furthermore, we introduce the concept of guided password generation, where we leverage PassGPT sampling procedure to generate passwords matching arbitrary constraints, a feat lacking in current GAN-based strategies. Lastly, we conduct an in-depth analysis of the entropy and probability distribution that PassGPT defines over passwords and discuss their use in enhancing existing password strength estimators.
LOMar 1, 2023
CoProver: A Recommender System for Proof ConstructionEric Yeh, Briland Hitaj, Sam Owre et al.
Interactive Theorem Provers (ITPs) are an indispensable tool in the arsenal of formal method experts as a platform for construction and (formal) verification of proofs. The complexity of the proofs in conjunction with the level of expertise typically required for the process to succeed can often hinder the adoption of ITPs. A recent strain of work has investigated methods to incorporate machine learning models trained on ITP user activity traces as a viable path towards full automation. While a valuable line of investigation, many problems still require human supervision to be completed fully, thus applying learning methods to assist the user with useful recommendations can prove more fruitful. Following the vein of user assistance, we introduce CoProver, a proof recommender system based on transformers, capable of learning from past actions during proof construction, all while exploring knowledge stored in the ITP concerning previous proofs. CoProver employs a neurally learnt sequence-based encoding of sequents, capturing long distance relationships between terms and hidden cues therein. We couple CoProver with the Prototype Verification System (PVS) and evaluate its performance on two key areas, namely: (1) Next Proof Action Recommendation, and (2) Relevant Lemma Retrieval given a library of theories. We evaluate CoProver on a series of well-established metrics originating from the recommender system and information retrieval communities, respectively. We show that CoProver successfully outperforms prior state of the art applied to recommendation in the domain. We conclude by discussing future directions viable for CoProver (and similar approaches) such as argument prediction, proof summarization, and more.
SEOct 6, 2022Code
Trust in Motion: Capturing Trust Ascendancy in Open-Source Projects using Hybrid AIHuascar Sanchez, Briland Hitaj
Open-source is frequently described as a driver for unprecedented communication and collaboration, and the process works best when projects support teamwork. Yet, open-source cooperation processes in no way protect project contributors from considerations of trust, power, and influence. Indeed, achieving the level of trust necessary to contribute to a project and thus influence its direction is a constant process of change, and developers take many different routes over many communication channels to achieve it. We refer to this process of influence-seeking and trust-building as trust ascendancy. This paper describes a methodology for understanding the notion of trust ascendancy and introduces the capabilities that are needed to localize trust ascendancy operations happening over open-source projects. Much of the prior work in understanding trust in open-source software development has focused on a static view of the problem using different forms of quantity measures. However, trust ascendancy is not static, but rather adapts to changes in the open-source ecosystem in response to new input. This paper is the first attempt to articulate and study these signals from a dynamic view of the problem. In that respect, we identify related work that may help illuminate research challenges, implementation tradeoffs, and complementary solutions. Our preliminary results show the effectiveness of our method at capturing the trust ascendancy developed by individuals involved in a well-documented 2020 social engineering attack. Our future plans highlight research challenges and encourage cross-disciplinary collaboration to create more automated, accurate, and efficient ways to model and then track trust ascendancy in open-source projects.
FLFeb 27, 2023
Revisiting Variable Ordering for Real Quantifier Elimination using Machine LearningJohn Hester, Briland Hitaj, Grant Passmore et al.
Cylindrical Algebraic Decomposition (CAD) is a key proof technique for formal verification of cyber-physical systems. CAD is computationally expensive, with worst-case doubly-exponential complexity. Selecting an optimal variable ordering is paramount to efficient use of CAD. Prior work has demonstrated that machine learning can be useful in determining efficient variable orderings. Much of this work has been driven by CAD problems extracted from applications of the MetiTarski theorem prover. In this paper, we revisit this prior work and consider issues of bias in existing training and test data. We observe that the classical MetiTarski benchmarks are heavily biased towards particular variable orderings. To address this, we apply symmetries to create a new dataset containing more than 41K MetiTarski challenges designed to remove bias. Furthermore, we evaluate issues of information leakage, and test the generalizability of our models on the new dataset.
CVMar 20, 2023
Automatic Measures for Evaluating Generative Design Methods for ArchitectsEric Yeh, Briland Hitaj, Vidyasagar Sadhu et al.
The recent explosion of high-quality image-to-image methods has prompted interest in applying image-to-image methods towards artistic and design tasks. Of interest for architects is to use these methods to generate design proposals from conceptual sketches, usually hand-drawn sketches that are quickly developed and can embody a design intent. More specifically, instantiating a sketch into a visual that can be used to elicit client feedback is typically a time consuming task, and being able to speed up this iteration time is important. While the body of work in generative methods has been impressive, there has been a mismatch between the quality measures used to evaluate the outputs of these systems and the actual expectations of architects. In particular, most recent image-based works place an emphasis on realism of generated images. While important, this is one of several criteria architects look for. In this work, we describe the expectations architects have for design proposals from conceptual sketches, and identify corresponding automated metrics from the literature. We then evaluate several image-to-image generative methods that may address these criteria and examine their performance across these metrics. From these results, we identify certain challenges with hand-drawn conceptual sketches and describe possible future avenues of investigation to address them.
LGDec 4, 2025
Multi-LLM Collaboration for Medication RecommendationHuascar Sanchez, Briland Hitaj, Jules Bergmann et al.
As healthcare increasingly turns to AI for scalable and trustworthy clinical decision support, ensuring reliability in model reasoning remains a critical challenge. Individual large language models (LLMs) are susceptible to hallucinations and inconsistency, whereas naive ensembles of models often fail to deliver stable and credible recommendations. Building on our previous work on LLM Chemistry, which quantifies the collaborative compatibility among LLMs, we apply this framework to improve the reliability in medication recommendation from brief clinical vignettes. Our approach leverages multi-LLM collaboration guided by Chemistry-inspired interaction modeling, enabling ensembles that are effective (exploiting complementary strengths), stable (producing consistent quality), and calibrated (minimizing interference and error amplification). We evaluate our Chemistry-based Multi-LLM collaboration strategy on real-world clinical scenarios to investigate whether such interaction-aware ensembles can generate credible, patient-specific medication recommendations. Preliminary results are encouraging, suggesting that LLM Chemistry-guided collaboration may offer a promising path toward reliable and trustworthy AI assistants in clinical practice.
CRMar 6, 2024
Do You Trust Your Model? Emerging Malware Threats in the Deep Learning EcosystemDorjan Hitaj, Giulio Pagnotta, Fabio De Gaspari et al.
Training high-quality deep learning models is a challenging task due to computational and technical requirements. A growing number of individuals, institutions, and companies increasingly rely on pre-trained, third-party models made available in public repositories. These models are often used directly or integrated in product pipelines with no particular precautions, since they are effectively just data in tensor form and considered safe. In this paper, we raise awareness of a new machine learning supply chain threat targeting neural networks. We introduce MaleficNet 2.0, a novel technique to embed self-extracting, self-executing malware in neural networks. MaleficNet 2.0 uses spread-spectrum channel coding combined with error correction techniques to inject malicious payloads in the parameters of deep neural networks. MaleficNet 2.0 injection technique is stealthy, does not degrade the performance of the model, and is robust against removal techniques. We design our approach to work both in traditional and distributed learning settings such as Federated Learning, and demonstrate that it is effective even when a reduced number of bits is used for the model parameters. Finally, we implement a proof-of-concept self-extracting neural network malware using MaleficNet 2.0, demonstrating the practicality of the attack against a widely adopted machine learning framework. Our aim with this work is to raise awareness against these new, dangerous attacks both in the research community and industry, and we hope to encourage further research in mitigation techniques against such threats.
LGOct 4, 2025
LLM Chemistry Estimation for Multi-LLM RecommendationHuascar Sanchez, Briland Hitaj
Multi-LLM collaboration promises accurate, robust, and context-aware solutions, yet existing approaches rely on implicit selection and output assessment without analyzing whether collaborating models truly complement or conflict. We introduce LLM Chemistry -- a framework that measures when LLM combinations exhibit synergistic or antagonistic behaviors that shape collective performance beyond individual capabilities. We formalize the notion of chemistry among LLMs, propose algorithms that quantify it by analyzing interaction dependencies, and recommend optimal model ensembles accordingly. Our theoretical analysis shows that chemistry among collaborating LLMs is most evident under heterogeneous model profiles, with its outcome impact shaped by task type, group size, and complexity. Evaluation on classification, summarization, and program repair tasks provides initial evidence for these task-dependent effects, thereby reinforcing our theoretical results. This establishes LLM Chemistry as both a diagnostic factor in multi-LLM systems and a foundation for ensemble recommendation.
CRSep 12, 2025
LLM in the Middle: A Systematic Review of Threats and Mitigations to Real-World LLM-based SystemsVitor Hugo Galhardo Moia, Igor Jochem Sanz, Gabriel Antonio Fontes Rebello et al.
The success and wide adoption of generative AI (GenAI), particularly large language models (LLMs), has attracted the attention of cybercriminals seeking to abuse models, steal sensitive data, or disrupt services. Moreover, providing security to LLM-based systems is a great challenge, as both traditional threats to software applications and threats targeting LLMs and their integration must be mitigated. In this survey, we shed light on security and privacy concerns of such LLM-based systems by performing a systematic review and comprehensive categorization of threats and defensive strategies considering the entire software and LLM life cycles. We analyze real-world scenarios with distinct characteristics of LLM usage, spanning from development to operation. In addition, threats are classified according to their severity level and to which scenarios they pertain, facilitating the identification of the most relevant threats. Recommended defense strategies are systematically categorized and mapped to the corresponding life cycle phase and possible attack strategies they attenuate. This work paves the way for consumers and vendors to understand and efficiently mitigate risks during integration of LLMs in their respective solutions or organizations. It also enables the research community to benefit from the discussion of open challenges and edge cases that may hinder the secure and privacy-preserving adoption of LLM-based systems.
CRFeb 12, 2022
TATTOOED: A Robust Deep Neural Network Watermarking Scheme based on Spread-Spectrum Channel CodingGiulio Pagnotta, Dorjan Hitaj, Briland Hitaj et al.
Watermarking of deep neural networks (DNNs) has gained significant traction in recent years, with numerous (watermarking) strategies being proposed as mechanisms that can help verify the ownership of a DNN in scenarios where these models are obtained without the permission of the owner. However, a growing body of work has demonstrated that existing watermarking mechanisms are highly susceptible to removal techniques, such as fine-tuning, parameter pruning, or shuffling. In this paper, we build upon extensive prior work on covert (military) communication and propose TATTOOED, a novel DNN watermarking technique that is robust to existing threats. We demonstrate that using TATTOOED as their watermarking mechanisms, the DNN owner can successfully obtain the watermark and verify model ownership even in scenarios where 99% of model parameters are altered. Furthermore, we show that TATTOOED is easy to employ in training pipelines, and has negligible impact on model performance.
CRJan 21, 2022
FedComm: Federated Learning as a Medium for Covert CommunicationDorjan Hitaj, Giulio Pagnotta, Briland Hitaj et al.
Proposed as a solution to mitigate the privacy implications related to the adoption of deep learning, Federated Learning (FL) enables large numbers of participants to successfully train deep neural networks without having to reveal the actual private training data. To date, a substantial amount of research has investigated the security and privacy properties of FL, resulting in a plethora of innovative attack and defense strategies. This paper thoroughly investigates the communication capabilities of an FL scheme. In particular, we show that a party involved in the FL learning process can use FL as a covert communication medium to send an arbitrary message. We introduce FedComm, a novel multi-system covert-communication technique that enables robust sharing and transfer of targeted payloads within the FL framework. Our extensive theoretical and empirical evaluations show that FedComm provides a stealthy communication channel, with minimal disruptions to the training process. Our experiments show that FedComm successfully delivers 100% of a payload in the order of kilobits before the FL procedure converges. Our evaluation also shows that FedComm is independent of the application domain and the neural network architecture used by the underlying FL scheme.
CROct 30, 2020
Capture the Bot: Using Adversarial Examples to Improve CAPTCHA Robustness to Bot AttacksDorjan Hitaj, Briland Hitaj, Sushil Jajodia et al.
To this date, CAPTCHAs have served as the first line of defense preventing unauthorized access by (malicious) bots to web-based services, while at the same time maintaining a trouble-free experience for human visitors. However, recent work in the literature has provided evidence of sophisticated bots that make use of advancements in machine learning (ML) to easily bypass existing CAPTCHA-based defenses. In this work, we take the first step to address this problem. We introduce CAPTURE, a novel CAPTCHA scheme based on adversarial examples. While typically adversarial examples are used to lead an ML model astray, with CAPTURE, we attempt to make a "good use" of such mechanisms. Our empirical evaluations show that CAPTURE can produce CAPTCHAs that are easy to solve by humans while at the same time, effectively thwarting ML-based bot solvers.
NEDec 5, 2019
Scratch that! An Evolution-based Adversarial Attack against Neural NetworksMalhar Jere, Loris Rossi, Briland Hitaj et al.
We study black-box adversarial attacks for image classifiers in a constrained threat model, where adversaries can only modify a small fraction of pixels in the form of scratches on an image. We show that it is possible for adversaries to generate localized \textit{adversarial scratches} that cover less than $5\%$ of the pixels in an image and achieve targeted success rates of $98.77\%$ and $97.20\%$ on ImageNet and CIFAR-10 trained ResNet-50 models, respectively. We demonstrate that our scratches are effective under diverse shapes, such as straight lines or parabolic B\a'ezier curves, with single or multiple colors. In an extreme condition, in which our scratches are a single color, we obtain a targeted attack success rate of $66\%$ on CIFAR-10 with an order of magnitude fewer queries than comparable attacks. We successfully launch our attack against Microsoft's Cognitive Services Image Captioning API and propose various mitigation strategies.
CRSep 1, 2017
PassGAN: A Deep Learning Approach for Password GuessingBriland Hitaj, Paolo Gasti, Giuseppe Ateniese et al.
State-of-the-art password guessing tools, such as HashCat and John the Ripper, enable users to check billions of passwords per second against password hashes. In addition to performing straightforward dictionary attacks, these tools can expand password dictionaries using password generation rules, such as concatenation of words (e.g., "password123456") and leet speak (e.g., "password" becomes "p4s5w0rd"). Although these rules work well in practice, expanding them to model further passwords is a laborious task that requires specialized expertise. To address this issue, in this paper we introduce PassGAN, a novel approach that replaces human-generated password rules with theory-grounded machine learning algorithms. Instead of relying on manual password analysis, PassGAN uses a Generative Adversarial Network (GAN) to autonomously learn the distribution of real passwords from actual password leaks, and to generate high-quality password guesses. Our experiments show that this approach is very promising. When we evaluated PassGAN on two large password datasets, we were able to surpass rule-based and state-of-the-art machine learning password guessing tools. However, in contrast with the other tools, PassGAN achieved this result without any a-priori knowledge on passwords or common password structures. Additionally, when we combined the output of PassGAN with the output of HashCat, we were able to match 51%-73% more passwords than with HashCat alone. This is remarkable, because it shows that PassGAN can autonomously extract a considerable number of password properties that current state-of-the art rules do not encode.
CRFeb 24, 2017
Deep Models Under the GAN: Information Leakage from Collaborative Deep LearningBriland Hitaj, Giuseppe Ateniese, Fernando Perez-Cruz
Deep Learning has recently become hugely popular in machine learning, providing significant improvements in classification accuracy in the presence of highly-structured and large databases. Researchers have also considered privacy implications of deep learning. Models are typically trained in a centralized manner with all the data being processed by the same training algorithm. If the data is a collection of users' private data, including habits, personal pictures, geographical positions, interests, and more, the centralized server will have access to sensitive information that could potentially be mishandled. To tackle this problem, collaborative deep learning models have recently been proposed where parties locally train their deep learning structures and only share a subset of the parameters in the attempt to keep their respective training sets private. Parameters can also be obfuscated via differential privacy (DP) to make information extraction even more challenging, as proposed by Shokri and Shmatikov at CCS'15. Unfortunately, we show that any privacy-preserving collaborative deep learning is susceptible to a powerful attack that we devise in this paper. In particular, we show that a distributed, federated, or decentralized deep learning approach is fundamentally broken and does not protect the training sets of honest participants. The attack we developed exploits the real-time nature of the learning process that allows the adversary to train a Generative Adversarial Network (GAN) that generates prototypical samples of the targeted training set that was meant to be private (the samples generated by the GAN are intended to come from the same distribution as the training data). Interestingly, we show that record-level DP applied to the shared parameters of the model, as suggested in previous work, is ineffective (i.e., record-level DP is not designed to address our attack).
CRMay 28, 2015
No Place to Hide that Bytes won't Reveal: Sniffing Location-Based Encrypted Traffic to Track a User's PositionGiuseppe Ateniese, Briland Hitaj, Luigi V. Mancini et al.
News reports of the last few years indicated that several intelligence agencies are able to monitor large networks or entire portions of the Internet backbone. Such a powerful adversary has only recently been considered by the academic literature. In this paper, we propose a new adversary model for Location Based Services (LBSs). The model takes into account an unauthorized third party, different from the LBS provider itself, that wants to infer the location and monitor the movements of a LBS user. We show that such an adversary can extrapolate the position of a target user by just analyzing the size and the timing of the encrypted traffic exchanged between that user and the LBS provider. We performed a thorough analysis of a widely deployed location based app that comes pre-installed with many Android devices: GoogleNow. The results are encouraging and highlight the importance of devising more effective countermeasures against powerful adversaries to preserve the privacy of LBS users.