Jay Roberts

LG
h-index2
5papers
25citations
Novelty54%
AI Score46

5 Papers

78.4AIMay 26Code
Laguna M.1/XS.2 Technical Report

Julien Abadji, Marah Abdin, Connor Adams et al.

We present Laguna M.1 and Laguna XS.2, two Mixture-of-Experts foundation models built for long-horizon, agentic coding: M.1 has $225.8$B total parameters ($23.4$B activated per token) and XS.2 has $33.4$B total ($3$B activated). Both models were trained from scratch end-to-end inside the same internal system that we refer to as our Model Factory: a tightly-integrated stack of versioned data, training, evaluation, and inference components that turn model development into an industrial process. We describe the principles and design choices of the Model Factory and also detail the end-to-end training process of our models, throughout pre-training data and architecture, post-training stages, evaluation, and quantization. On agentic software engineering and terminal benchmarks (SWE-bench Verified, SWE-bench Multilingual, SWE-Bench Pro, and Terminal-Bench 2.0) M.1 and XS.2 are competitive with state-of-the-art open models in their respective weight classes. Laguna XS.2 weights are released under Apache~2.0 at https://huggingface.co/collections/poolside/laguna-xs2.

LGJun 11, 2025
Learning Obfuscations Of LLM Embedding Sequences: Stained Glass Transform

Jay Roberts, Kyle Mylonakis, Sidhartha Roy et al.

The high cost of ownership of AI compute infrastructure and challenges of robust serving of large language models (LLMs) has led to a surge in managed Model-as-a-service deployments. Even when enterprises choose on-premises deployments, the compute infrastructure is typically shared across many teams in order to maximize the return on investment. In both scenarios the deployed models operate only on plaintext data, and so enterprise data owners must allow their data to appear in plaintext on a shared or multi-tenant compute infrastructure. This results in data owners with private or sensitive data being hesitant or restricted in what data they use with these types of deployments. In this work we introduce the Stained Glass Transform, a learned, stochastic, and sequence dependent transformation of the word embeddings of an LLM which information theoretically provides privacy to the input of the LLM while preserving the utility of model. We theoretically connect a particular class of Stained Glass Transforms to the theory of mutual information of Gaussian Mixture Models. We then calculate a-postiori privacy estimates, based on mutual information, and verify the privacy and utility of instances of transformed embeddings through token level metrics of privacy and standard LLM performance benchmarks.

LGDec 22, 2020
Understanding and Increasing Efficiency of Frank-Wolfe Adversarial Training

Theodoros Tsiligkaridis, Jay Roberts

Deep neural networks are easily fooled by small perturbations known as adversarial attacks. Adversarial Training (AT) is a technique that approximately solves a robust optimization problem to minimize the worst-case loss and is widely regarded as the most effective defense. Due to the high computation time for generating strong adversarial examples in the AT process, single-step approaches have been proposed to reduce training time. However, these methods suffer from catastrophic overfitting where adversarial accuracy drops during training, and although improvements have been proposed, they increase training time and robustness is far from that of multi-step AT. We develop a theoretical framework for adversarial training with FW optimization (FW-AT) that reveals a geometric connection between the loss landscape and the $\ell_2$ distortion of $\ell_\infty$ FW attacks. We analytically show that high distortion of FW attacks is equivalent to small gradient variation along the attack path. It is then experimentally demonstrated on various deep neural network architectures that $\ell_\infty$ attacks against robust models achieve near maximal distortion, while standard networks have lower distortion. It is experimentally shown that catastrophic overfitting is strongly correlated with low distortion of FW attacks. This mathematical transparency differentiates FW from Projected Gradient Descent (PGD) optimization. To demonstrate the utility of our theoretical framework we develop FW-AT-Adapt, a novel adversarial training algorithm which uses a simple distortion measure to adapt the number of attack steps during training to increase efficiency without compromising robustness. FW-AT-Adapt provides training time on par with single-step fast AT methods and closes the gap between fast AT methods and multi-step PGD-AT with minimal loss in adversarial accuracy in white-box and black-box settings.

IVNov 30, 2020
Ultrasound Diagnosis of COVID-19: Robustness and Explainability

Jay Roberts, Theodoros Tsiligkaridis

Diagnosis of COVID-19 at point of care is vital to the containment of the global pandemic. Point of care ultrasound (POCUS) provides rapid imagery of lungs to detect COVID-19 in patients in a repeatable and cost effective way. Previous work has used public datasets of POCUS videos to train an AI model for diagnosis that obtains high sensitivity. Due to the high stakes application we propose the use of robust and explainable techniques. We demonstrate experimentally that robust models have more stable predictions and offer improved interpretability. A framework of contrastive explanations based on adversarial perturbations is used to explain model predictions that aligns with human visual perception.

LGSep 10, 2020
Second Order Optimization for Adversarial Robustness and Interpretability

Theodoros Tsiligkaridis, Jay Roberts

Deep neural networks are easily fooled by small perturbations known as adversarial attacks. Adversarial Training (AT) is a technique aimed at learning features robust to such attacks and is widely regarded as a very effective defense. However, the computational cost of such training can be prohibitive as the network size and input dimensions grow. Inspired by the relationship between robustness and curvature, we propose a novel regularizer which incorporates first and second order information via a quadratic approximation to the adversarial loss. The worst case quadratic loss is approximated via an iterative scheme. It is shown that using only a single iteration in our regularizer achieves stronger robustness than prior gradient and curvature regularization schemes, avoids gradient obfuscation, and, with additional iterations, achieves strong robustness with significantly lower training time than AT. Further, it retains the interesting facet of AT that networks learn features which are well-aligned with human perception. We demonstrate experimentally that our method produces higher quality human-interpretable features than other geometric regularization techniques. These robust features are then used to provide human-friendly explanations to model predictions.