CRAug 1, 2025Code
Llama-3.1-FoundationAI-SecurityLLM-8B-Instruct Technical ReportSajana Weerawardhena, Paul Kassianik, Blaine Nelson et al.
Large language models (LLMs) have shown remarkable success across many domains, yet their integration into cybersecurity applications remains limited due to a lack of general-purpose cybersecurity data, representational complexity, and safety and regulatory concerns. To address this gap, we previously introduced Foundation-Sec-8B, a cybersecurity-focused LLM suitable for fine-tuning on downstream tasks. That model, however, was not designed for chat-style interactions or instruction-following. In this report, we release Foundation-Sec-8B-Instruct: a model specifically trained for general-purpose cybersecurity dialogue. Built on Foundation-Sec-8B, it combines domain-specific knowledge with instruction-following, conversational capabilities, and alignment with human preferences to produce high-quality, relevant responses. Comprehensive evaluations show that Foundation-Sec-8B-Instruct outperforms Llama 3.1-8B-Instruct on a range of cybersecurity tasks while matching its instruction-following performance. It is also competitive with GPT-4o-mini on cyber threat intelligence and instruction-following tasks. We envision Foundation-Sec-8B-Instruct becoming an indispensable assistant in the daily workflows of cybersecurity professionals. We release the model publicly at https://huggingface.co/fdtn-ai/Foundation-Sec-8B-Instruct.
LGDec 4, 2023
Tree of Attacks: Jailbreaking Black-Box LLMs AutomaticallyAnay Mehrotra, Manolis Zampetakis, Paul Kassianik et al.
While Large Language Models (LLMs) display versatile functionality, they continue to generate harmful, biased, and toxic content, as demonstrated by the prevalence of human-designed jailbreaks. In this work, we present Tree of Attacks with Pruning (TAP), an automated method for generating jailbreaks that only requires black-box access to the target LLM. TAP utilizes an attacker LLM to iteratively refine candidate (attack) prompts until one of the refined prompts jailbreaks the target. In addition, before sending prompts to the target, TAP assesses them and prunes the ones unlikely to result in jailbreaks, reducing the number of queries sent to the target LLM. In empirical evaluations, we observe that TAP generates prompts that jailbreak state-of-the-art LLMs (including GPT4-Turbo and GPT4o) for more than 80% of the prompts. This significantly improves upon the previous state-of-the-art black-box methods for generating jailbreaks while using a smaller number of queries than them. Furthermore, TAP is also capable of jailbreaking LLMs protected by state-of-the-art guardrails, e.g., LlamaGuard.
AIJan 28Code
Llama-3.1-FoundationAI-SecurityLLM-Reasoning-8B Technical ReportZhuoran Yang, Ed Li, Jianliang He et al.
We present Foundation-Sec-8B-Reasoning, the first open-source native reasoning model for cybersecurity. Built upon our previously released Foundation-Sec-8B base model (derived from Llama-3.1-8B-Base), the model is trained through a two-stage process combining supervised fine-tuning (SFT) and reinforcement learning from verifiable rewards (RLVR). Our training leverages proprietary reasoning data spanning cybersecurity analysis, instruction-following, and mathematical reasoning. Evaluation across 10 cybersecurity benchmarks and 10 general-purpose benchmarks demonstrates performance competitive with significantly larger models on cybersecurity tasks while maintaining strong general capabilities. The model shows effective generalization on multi-hop reasoning tasks and strong safety performance when deployed with appropriate system prompts and guardrails. This work demonstrates that domain-specialized reasoning models can achieve strong performance on specialized tasks while maintaining broad general capabilities. We release the model publicly at https://huggingface.co/fdtn-ai/Foundation-Sec-8B-Reasoning.
LGOct 22, 2020Code
Adversarial Attacks on Binary Image Recognition SystemsEric Balkanski, Harrison Chase, Kojin Oshiba et al.
We initiate the study of adversarial attacks on models for binary (i.e. black and white) image classification. Although there has been a great deal of work on attacking models for colored and grayscale images, little is known about attacks on models for binary images. Models trained to classify binary images are used in text recognition applications such as check processing, license plate recognition, invoice processing, and many others. In contrast to colored and grayscale images, the search space of attacks on binary images is extremely restricted and noise cannot be hidden with minor perturbations in each pixel. Thus, the optimization landscape of attacks on binary images introduces new fundamental challenges. In this paper we introduce a new attack algorithm called SCAR, designed to fool classifiers of binary images. We show that SCAR significantly outperforms existing $L_0$ attacks applied to the binary setting and use it to demonstrate the vulnerability of real-world text recognition systems. SCAR's strong performance in practice contrasts with the existence of classifiers that are provably robust to large perturbations. In many cases, altering a single pixel is sufficient to trick Tesseract, a popular open-source text recognition system, to misclassify a word as a different word in the English dictionary. We also license software from providers of check processing systems to most of the major US banks and demonstrate the vulnerability of check recognitions for mobile deposits. These systems are substantially harder to fool since they classify both the handwritten amounts in digits and letters, independently. Nevertheless, we generalize SCAR to design attacks that fool state-of-the-art check processing systems using unnoticeable perturbations that lead to misclassification of deposit amounts. Consequently, this is a powerful method to perform financial fraud.
LGFeb 3, 2025
Adversarial Reasoning at Jailbreaking TimeMahdi Sabbaghi, Paul Kassianik, George Pappas et al.
As large language models (LLMs) are becoming more capable and widespread, the study of their failure cases is becoming increasingly important. Recent advances in standardizing, measuring, and scaling test-time compute suggest new methodologies for optimizing models to achieve high performance on hard tasks. In this paper, we apply these advances to the task of model jailbreaking: eliciting harmful responses from aligned LLMs. We develop an adversarial reasoning approach to automatic jailbreaking that leverages a loss signal to guide the test-time compute, achieving SOTA attack success rates against many aligned LLMs, even those that aim to trade inference-time compute for adversarial robustness. Our approach introduces a new paradigm in understanding LLM vulnerabilities, laying the foundation for the development of more robust and trustworthy AI systems.
CRApr 28, 2025
Llama-3.1-FoundationAI-SecurityLLM-Base-8B Technical ReportPaul Kassianik, Baturay Saglam, Alexander Chen et al.
As transformer-based large language models (LLMs) increasingly permeate society, they have revolutionized domains such as software engineering, creative writing, and digital arts. However, their adoption in cybersecurity remains limited due to challenges like scarcity of specialized training data and complexity of representing cybersecurity-specific knowledge. To address these gaps, we present Foundation-Sec-8B, a cybersecurity-focused LLM built on the Llama 3.1 architecture and enhanced through continued pretraining on a carefully curated cybersecurity corpus. We evaluate Foundation-Sec-8B across both established and new cybersecurity benchmarks, showing that it matches Llama 3.1-70B and GPT-4o-mini in certain cybersecurity-specific tasks. By releasing our model to the public, we aim to accelerate progress and adoption of AI-driven tools in both public and private cybersecurity contexts.
CLJul 13, 2025
Large Language Models Encode Semantics in Low-Dimensional Linear SubspacesBaturay Saglam, Paul Kassianik, Blaine Nelson et al.
Understanding the latent space geometry of large language models (LLMs) is key to interpreting their behavior and improving alignment. However, it remains unclear to what extent LLMs internally organize representations related to semantic understanding. To explore this, we conduct a large-scale empirical study of hidden representations in 11 autoregressive models across 6 scientific topics. We find that high-level semantic information consistently resides in low-dimensional subspaces that form linearly separable representations across domains. This separability becomes more pronounced in deeper layers and under prompts that elicit structured reasoning or alignment behavior$\unicode{x2013}$even when surface content remains unchanged. These findings support geometry-aware tools that operate directly in latent space to detect and mitigate harmful or adversarial content. As a proof of concept, we train an MLP probe on final-layer hidden states to act as a lightweight latent-space guardrail. This approach substantially improves refusal rates on malicious queries and prompt injections that bypass both the model's built-in safety alignment and external token-level filters.
LGFeb 23, 2021
Instance Specific Approximations for Submodular MaximizationEric Balkanski, Sharon Qian, Yaron Singer
For many optimization problems in machine learning, finding an optimal solution is computationally intractable and we seek algorithms that perform well in practice. Since computational intractability often results from pathological instances, we look for methods to benchmark the performance of algorithms against optimal solutions on real-world instances. The main challenge is that an optimal solution cannot be efficiently computed for intractable problems, and we therefore often do not know how far a solution is from being optimal. A major question is therefore how to measure the performance of an algorithm in comparison to an optimal solution on instances we encounter in practice. In this paper, we address this question in the context of submodular optimization problems. For the canonical problem of submodular maximization under a cardinality constraint, it is intractable to compute a solution that is better than a $1-1/e \approx 0.63$ fraction of the optimum. Algorithms like the celebrated greedy algorithm are guaranteed to achieve this $1-1/e$ bound on any instance and are used in practice. Our main contribution is not a new algorithm for submodular maximization but an analytical method that measures how close an algorithm for submodular maximization is to optimal on a given problem instance. We use this method to show that on a wide variety of real-world datasets and objectives, the approximation of the solution found by greedy goes well beyond $1-1/e$ and is often at least 0.95. We develop this method using a novel technique that lower bounds the objective of a dual minimization problem to obtain an upper bound on the value of an optimal solution to the primal maximization problem.
LGJun 20, 2020
An Optimal Elimination Algorithm for Learning a Best ArmAvinatan Hassidim, Ron Kupfer, Yaron Singer
We consider the classic problem of $(ε,δ)$-PAC learning a best arm where the goal is to identify with confidence $1-δ$ an arm whose mean is an $ε$-approximation to that of the highest mean arm in a multi-armed bandit setting. This problem is one of the most fundamental problems in statistics and learning theory, yet somewhat surprisingly its worst-case sample complexity is not well understood. In this paper, we propose a new approach for $(ε,δ)$-PAC learning a best arm. This approach leads to an algorithm whose sample complexity converges to \emph{exactly} the optimal sample complexity of $(ε,δ)$-learning the mean of $n$ arms separately and we complement this result with a conditional matching lower bound. More specifically:
CLApr 26, 2020
Causal Mediation Analysis for Interpreting Neural NLP: The Case of Gender BiasJesse Vig, Sebastian Gehrmann, Yonatan Belinkov et al.
Common methods for interpreting neural models in natural language processing typically examine either their structure or their behavior, but not both. We propose a methodology grounded in the theory of causal mediation analysis for interpreting which parts of a model are causally implicated in its behavior. It enables us to analyze the mechanisms by which information flows from input to output through various model components, known as mediators. We apply this methodology to analyze gender bias in pre-trained Transformer language models. We study the role of individual neurons and attention heads in mediating gender bias across three datasets designed to gauge a model's sensitivity to gender bias. Our mediation analysis reveals that gender bias effects are (i) sparse, concentrated in a small part of the network; (ii) synergistic, amplified or repressed by different components; and (iii) decomposable into effects flowing directly from the input and indirectly through the mediators.
LGFeb 21, 2020
Robustness from Simple ClassifiersSharon Qian, Dimitris Kalimeris, Gal Kaplun et al.
Despite the vast success of Deep Neural Networks in numerous application domains, it has been shown that such models are not robust i.e., they are vulnerable to small adversarial perturbations of the input. While extensive work has been done on why such perturbations occur or how to successfully defend against them, we still do not have a complete understanding of robustness. In this work, we investigate the connection between robustness and simplicity. We find that simpler classifiers, formed by reducing the number of output classes, are less susceptible to adversarial perturbations. Consequently, we demonstrate that decomposing a complex multiclass model into an aggregation of binary models enhances robustness. This behavior is consistent across different datasets and model architectures and can be combined with known defense techniques such as adversarial training. Moreover, we provide further evidence of a disconnect between standard and robust learning regimes. In particular, we show that elaborate label information can help standard accuracy but harm robustness.
LGJul 14, 2019
The FAST Algorithm for Submodular MaximizationAdam Breuer, Eric Balkanski, Yaron Singer
In this paper we describe a new algorithm called Fast Adaptive Sequencing Technique (FAST) for maximizing a monotone submodular function under a cardinality constraint $k$ whose approximation ratio is arbitrarily close to $1-1/e$, is $O(\log(n) \log^2(\log k))$ adaptive, and uses a total of $O(n \log\log(k))$ queries. Recent algorithms have comparable guarantees in terms of asymptotic worst case analysis, but their actual number of rounds and query complexity depend on very large constants and polynomials in terms of precision and confidence, making them impractical for large data sets. Our main contribution is a design that is extremely efficient both in terms of its non-asymptotic worst case query complexity and number of rounds, and in terms of its practical runtime. We show that this algorithm outperforms any algorithm for submodular maximization we are aware of, including hyper-optimized parallel versions of state-of-the-art serial algorithms, by running experiments on large data sets. These experiments show FAST is orders of magnitude faster than the state-of-the-art.
LGJun 14, 2019
Predicting Choice with Set-Dependent AggregationNir Rosenfeld, Kojin Oshiba, Yaron Singer
Providing users with alternatives to choose from is an essential component in many online platforms, making the accurate prediction of choice vital to their success. A renewed interest in learning choice models has led to significant progress in modeling power, but most current methods are either limited in the types of choice behavior they capture, cannot be applied to large-scale data, or both. Here we propose a learning framework for predicting choice that is accurate, versatile, theoretically grounded, and scales well. Our key modeling point is that to account for how humans choose, predictive models must capture certain set-related invariances. Building on recent results in economics, we derive a class of models that can express any behavioral choice pattern, enjoy favorable sample complexity guarantees, and can be efficiently trained end-to-end. Experiments on three large choice datasets demonstrate the utility of our approach.
LGJun 6, 2019
Robust Attacks against Multiple ClassifiersJuan C. Perdomo, Yaron Singer
We address the challenge of designing optimal adversarial noise algorithms for settings where a learner has access to multiple classifiers. We demonstrate how this problem can be framed as finding strategies at equilibrium in a two-player, zero-sum game between a learner and an adversary. In doing so, we illustrate the need for randomization in adversarial attacks. In order to compute Nash equilibrium, our main technical focus is on the design of best response oracles that can then be implemented within a Multiplicative Weights Update framework to boost deterministic perturbations against a set of models into optimal mixed strategies. We demonstrate the practical effectiveness of our approach on a series of image classification tasks using both linear classifiers and deep neural networks.
LGMar 9, 2019
Robust Influence Maximization for Hyperparametric ModelsDimitris Kalimeris, Gal Kaplun, Yaron Singer
In this paper, we study the problem of robust influence maximization in the independent cascade model under a hyperparametric assumption. In social networks users influence and are influenced by individuals with similar characteristics and as such, they are associated with some features. A recent surging research direction in influence maximization focuses on the case where the edge probabilities on the graph are not arbitrary but are generated as a function of the features of the users and a global hyperparameter. We propose a model where the objective is to maximize the worst-case number of influenced users for any possible value of that hyperparameter. We provide theoretical results showing that proper robust solution in our model is NP-hard and an algorithm that achieves improper robust optimization. We make-use of sampling based techniques and of the renowned multiplicative weight updates algorithm. Additionally, we validate our method empirically and prove that it outperforms the state-of-the-art robust influence maximization techniques.
LGMar 6, 2019
Fast Parallel Algorithms for Statistical Subset Selection ProblemsSharon Qian, Yaron Singer
In this paper, we propose a new framework for designing fast parallel algorithms for fundamental statistical subset selection tasks that include feature selection and experimental design. Such tasks are known to be weakly submodular and are amenable to optimization via the standard greedy algorithm. Despite its desirable approximation guarantees, the greedy algorithm is inherently sequential and in the worst case, its parallel runtime is linear in the size of the data. Recently, there has been a surge of interest in a parallel optimization technique called adaptive sampling which produces solutions with desirable approximation guarantees for submodular maximization in exponentially faster parallel runtime. Unfortunately, we show that for general weakly submodular functions such accelerations are impossible. The major contribution in this paper is a novel relaxation of submodularity which we call differential submodularity. We first prove that differential submodularity characterizes objectives like feature selection and experimental design. We then design an adaptive sampling algorithm for differentially submodular functions whose parallel runtime is logarithmic in the size of the data and achieves strong approximation guarantees. Through experiments, we show the algorithm's performance is competitive with state-of-the-art methods and obtains dramatic speedups for feature selection and experimental design problems.
MLNov 27, 2018
Robust Classification of Financial RiskSuproteem K. Sarkar, Kojin Oshiba, Daniel Giebisch et al.
Algorithms are increasingly common components of high-impact decision-making, and a growing body of literature on adversarial examples in laboratory settings indicates that standard machine learning models are not robust. This suggests that real-world systems are also susceptible to manipulation or misclassification, which especially poses a challenge to machine learning models used in financial services. We use the loan grade classification problem to explore how machine learning models are sensitive to small changes in user-reported data, using adversarial attacks documented in the literature and an original, domain-specific attack. Our work shows that a robust optimization algorithm can build models for financial services that are resistant to misclassification on perturbations. To the best of our knowledge, this is the first study of adversarial attacks and defenses for deep learning in financial services.
LGAug 12, 2018
Parallelization does not Accelerate Convex Optimization: Adaptivity Lower Bounds for Non-smooth Convex MinimizationEric Balkanski, Yaron Singer
In this paper we study the limitations of parallelization in convex optimization. A convenient approach to study parallelization is through the prism of \emph{adaptivity} which is an information theoretic measure of the parallel runtime of an algorithm [BS18]. Informally, adaptivity is the number of sequential rounds an algorithm needs to make when it can execute polynomially-many queries in parallel at every round. For combinatorial optimization with black-box oracle access, the study of adaptivity has recently led to exponential accelerations in parallel runtime and the natural question is whether dramatic accelerations are achievable for convex optimization. For the problem of minimizing a non-smooth convex function $f:[0,1]^n\to \mathbb{R}$ over the unit Euclidean ball, we give a tight lower bound that shows that even when $\texttt{poly}(n)$ queries can be executed in parallel, there is no randomized algorithm with $\tilde{o}(n^{1/3})$ rounds of adaptivity that has convergence rate that is better than those achievable with a one-query-per-round algorithm. A similar lower bound was obtained by Nemirovski [Nem94], however that result holds for the $\ell_{\infty}$-setting instead of $\ell_2$. In addition, we also show a tight lower bound that holds for Lipschitz and strongly convex functions. At the time of writing this manuscript we were not aware of Nemirovski's result. The construction we use is similar to the one in [Nem94], though our analysis is different. Due to the close relationship between this work and [Nem94], we view the research contribution of this manuscript limited and it should serve as an instructful approach to understanding lower bounds for parallel optimization.
LGJul 4, 2017
Robust Optimization for Non-Convex ObjectivesRobert Chen, Brendan Lucier, Yaron Singer et al.
We consider robust optimization problems, where the goal is to optimize in the worst case over a class of objective functions. We develop a reduction from robust improper optimization to Bayesian optimization: given an oracle that returns $α$-approximate solutions for distributions over objectives, we compute a distribution over solutions that is $α$-approximate in the worst case. We show that de-randomizing this solution is NP-hard in general, but can be done for a broad class of statistical learning tasks. We apply our results to robust neural network training and submodular optimization. We evaluate our approach experimentally on corrupted character classification, and robust influence maximization in networks.
DSJan 12, 2016
Submodular Optimization under NoiseAvinatan Hassidim, Yaron Singer
We consider the problem of maximizing a monotone submodular function under noise. There has been a great deal of work on optimization of submodular functions under various constraints, resulting in algorithms that provide desirable approximation guarantees. In many applications, however, we do not have access to the submodular function we aim to optimize, but rather to some erroneous or noisy version of it. This raises the question of whether provable guarantees are obtainable in presence of error and noise. We provide initial answers, by focusing on the question of maximizing a monotone submodular function under a cardinality constraint when given access to a noisy oracle of the function. We show that: - For a cardinality constraint $k \geq 2$, there is an approximation algorithm whose approximation ratio is arbitrarily close to $1-1/e$; - For $k=1$ there is an algorithm whose approximation ratio is arbitrarily close to $1/2$. No randomized algorithm can obtain an approximation ratio better than $1/2+o(1)$; -If the noise is adversarial, no non-trivial approximation guarantee can be obtained.
DSDec 19, 2015
The Limitations of Optimization from SamplesEric Balkanski, Aviad Rubinstein, Yaron Singer
In this paper we consider the following question: can we optimize objective functions from the training data we use to learn them? We formalize this question through a novel framework we call optimization from samples (OPS). In OPS, we are given sampled values of a function drawn from some distribution and the objective is to optimize the function under some constraint. While there are interesting classes of functions that can be optimized from samples, our main result is an impossibility. We show that there are classes of functions which are statistically learnable and optimizable, but for which no reasonable approximation for optimization from samples is achievable. In particular, our main result shows that there is no constant factor approximation for maximizing coverage functions under a cardinality constraint using polynomially-many samples drawn from any distribution. We also show tight approximation guarantees for maximization under a cardinality constraint of several interesting classes of functions including unit-demand, additive, and general monotone submodular functions, as well as a constant factor approximation for monotone submodular functions with bounded curvature.