Direct Amortized Likelihood Ratio EstimationAdam D. Cobb, Brian Matejek, Daniel Elenius et al.
We introduce a new amortized likelihood ratio estimator for likelihood-free simulation-based inference (SBI). Our estimator is simple to train and estimates the likelihood ratio using a single forward pass of the neural estimator. Our approach directly computes the likelihood ratio between two competing parameter sets which is different from the previous approach of comparing two neural network output values. We refer to our model as the direct neural ratio estimator (DNRE). As part of introducing the DNRE, we derive a corresponding Monte Carlo estimate of the posterior. We benchmark our new ratio estimator and compare to previous ratio estimators in the literature. We show that our new ratio estimator often outperforms these previous approaches. As a further contribution, we introduce a new derivative estimator for likelihood ratio estimators that enables us to compare likelihood-free Hamiltonian Monte Carlo (HMC) with random-walk Metropolis-Hastings (MH). We show that HMC is equally competitive, which has not been previously shown. Finally, we include a novel real-world application of SBI by using our neural ratio estimator to design a quadcopter. Code is available at https://github.com/SRI-CSL/dnre.
4.6LGNov 11, 2022
Design of Unmanned Air Vehicles Using Transformer Surrogate ModelsAdam D. Cobb, Anirban Roy, Daniel Elenius et al.
Computer-aided design (CAD) is a promising new area for the application of artificial intelligence (AI) and machine learning (ML). The current practice of design of cyber-physical systems uses the digital twin methodology, wherein the actual physical design is preceded by building detailed models that can be evaluated by physics simulation models. These physics models are often slow and the manual design process often relies on exploring near-by variations of existing designs. AI holds the promise of breaking these design silos and increasing the diversity and performance of designs by accelerating the exploration of the design space. In this paper, we focus on the design of electrical unmanned aerial vehicles (UAVs). The high-density batteries and purely electrical propulsion systems have disrupted the space of UAV design, making this domain an ideal target for AI-based design. In this paper, we develop an AI Designer that synthesizes novel UAV designs. Our approach uses a deep transformer model with a novel domain-specific encoding such that we can evaluate the performance of new proposed designs without running expensive flight dynamics models and CAD tools. We demonstrate that our approach significantly reduces the overall compute requirements for the design process and accelerates the design space exploration. Finally, we identify future research directions to achieve full-scale deployment of AI-assisted CAD for UAVs.
Privacy Preserving In-Context-Learning Framework for Large Language ModelsBishnu Bhusal, Manoj Acharya, Ramneet Kaur et al.
Large language models (LLMs) have significantly transformed natural language understanding and generation, but they raise privacy concerns due to potential exposure of sensitive information. Studies have highlighted the risk of information leakage, where adversaries can extract sensitive information embedded in the prompts. In this work, we introduce a novel private prediction framework for generating high-quality synthetic text with strong privacy guarantees. Our approach leverages the Differential Privacy (DP) framework to ensure worst-case theoretical bounds on information leakage without requiring any fine-tuning of the underlying models. The proposed method performs inference on private records and aggregates the resulting per-token output distributions. This enables the generation of longer and coherent synthetic text while maintaining privacy guarantees. Additionally, we propose a simple blending operation that combines private and public inference to further enhance utility. Empirical evaluations demonstrate that our approach outperforms previous state-of-the-art methods on in-context-learning (ICL) tasks, making it a promising direction for privacy-preserving text generation while maintaining high utility. Our code is available at https://github.com/bhusalb/privacy-preserving-icl.
Scalable Bayesian Low-Rank Adaptation of Large Language Models via Stochastic Variational Subspace InferenceColin Samplawski, Adam D. Cobb, Manoj Acharya et al.
Despite their widespread use, large language models (LLMs) are known to hallucinate incorrect information and be poorly calibrated. This makes the uncertainty quantification of these models of critical importance, especially in high-stakes domains, such as autonomy and healthcare. Prior work has made Bayesian deep learning-based approaches to this problem more tractable by performing inference over the low-rank adaptation (LoRA) parameters of a fine-tuned model. While effective, these approaches struggle to scale to larger LLMs due to requiring further additional parameters compared to LoRA. In this work we present $\textbf{Scala}$ble $\textbf{B}$ayesian $\textbf{L}$ow-Rank Adaptation via Stochastic Variational Subspace Inference (ScalaBL). We perform Bayesian inference in an $r$-dimensional subspace, for LoRA rank $r$. By repurposing the LoRA parameters as projection matrices, we are able to map samples from this subspace into the full weight space of the LLM. This allows us to learn all the parameters of our approach using stochastic variational inference. Despite the low dimensionality of our subspace, we are able to achieve competitive performance with state-of-the-art approaches while only requiring ${\sim}1000$ additional parameters. Furthermore, it allows us to scale up to the largest Bayesian LLM to date, with four times as a many base parameters as prior work.
5.5LGJul 15, 2021
Decentralized Bayesian Learning with Metropolis-Adjusted Hamiltonian Monte CarloVyacheslav Kungurtsev, Adam Cobb, Tara Javidi et al.
Federated learning performed by a decentralized networks of agents is becoming increasingly important with the prevalence of embedded software on autonomous devices. Bayesian approaches to learning benefit from offering more information as to the uncertainty of a random quantity, and Langevin and Hamiltonian methods are effective at realizing sampling from an uncertain distribution with large parameter dimensions. Such methods have only recently appeared in the decentralized setting, and either exclusively use stochastic gradient Langevin and Hamiltonian Monte Carlo approaches that require a diminishing stepsize to asymptotically sample from the posterior and are known in practice to characterize uncertainty less faithfully than constant step-size methods with a Metropolis adjustment, or assume strong convexity properties of the potential function. We present the first approach to incorporating constant stepsize Metropolis-adjusted HMC in the decentralized sampling framework, show theoretical guarantees for consensus and probability distance to the posterior stationary distribution, and demonstrate their effectiveness numerically on standard real world problems, including decentralized learning of neural networks which is known to be highly non-convex.
Scaling Hamiltonian Monte Carlo Inference for Bayesian Neural Networks with Symmetric SplittingAdam D. Cobb, Brian Jalaian
Hamiltonian Monte Carlo (HMC) is a Markov chain Monte Carlo (MCMC) approach that exhibits favourable exploration properties in high-dimensional models such as neural networks. Unfortunately, HMC has limited use in large-data regimes and little work has explored suitable approaches that aim to preserve the entire Hamiltonian. In our work, we introduce a new symmetric integration scheme for split HMC that does not rely on stochastic gradients. We show that our new formulation is more efficient than previous approaches and is easy to implement with a single GPU. As a result, we are able to perform full HMC over common deep learning architectures using entire data sets. In addition, when we compare with stochastic gradient MCMC, we show that our method achieves better performance in both accuracy and uncertainty quantification. Our approach demonstrates HMC as a feasible option when considering inference schemes for large-scale machine learning problems.