AISep 23, 2022
Augmenting Interpretable Models with LLMs during TrainingChandan Singh, Armin Askari, Rich Caruana et al. · berkeley
Recent large language models (LLMs) have demonstrated remarkable prediction performance for a growing array of tasks. However, their proliferation into high-stakes domains (e.g. medicine) and compute-limited settings has created a burgeoning need for interpretability and efficiency. We address this need by proposing Augmented Interpretable Models (Aug-imodels), a framework for leveraging the knowledge learned by LLMs to build extremely efficient and interpretable models. Aug-imodels use LLMs during fitting but not during inference, allowing complete transparency and often a speed/memory improvement of greater than 1,000x for inference compared to LLMs. We explore two instantiations of Aug-imodels in natural-language processing: (i) Aug-GAM, which augments a generalized additive model with decoupled embeddings from an LLM and (ii) Aug-Tree, which augments a decision tree with LLM feature expansions. Across a variety of text-classification datasets, both outperform their non-augmented counterparts. Aug-GAM can even outperform much larger models (e.g. a 6-billion parameter GPT-J model), despite having 10,000x fewer parameters and being fully transparent. We further explore Aug-imodels in a natural-language fMRI study, where they generate interesting interpretations from scientific data. All code for using Aug-imodels and reproducing results is made available on Github.
SYMar 5, 2017
Effect of Adaptive and Cooperative Adaptive Cruise Control on Throughput of Signalized ArterialsArmin Askari, Daniel Albarnaz Farias, Alex A. Kurzhanskiy et al.
The paper evaluates the influence of the maximum vehicle acceleration and variable proportions of ACC/CACC vehicles on the throughput of an intersection. Two cases are studied: (1) free road downstream of the intersection; and (2) red light at some distance downstream of the intersection. Simulation of a 4-mile stretch of an arterial with 13 signalized intersections is used to evaluate the impact of (C)ACC vehicles on the mean and standard deviation of travel time as the proportion of (C)ACC vehicles is increased. The results suggest a very high urban mobility benefit of (C)ACC vehicles at little or no cost in infrastructure.
SYDec 22, 2017
Measuring Impact of Adaptive and Cooperative Adaptive Cruise Control on Throughput of Signalized IntersectionsArmin Askari, Daniel Albarnaz Farias, Alex A. Kurzhanskiy et al.
To properly assess the impact of (cooperative) adaptive cruise control ACC (CACC), one has to model vehicle dynamics. First of all, one has to choose the car following model, as it determines the vehicle flow as vehicles accelerate from standstill or decelerate because of the obstacle ahead. The other factor significantly affecting the intersection throughput is the maximal vehicle acceleration rate. In this paper, we analyze three car following behaviors: Gipps model, Improved Intelligent Driver Model (IIDM) and Helly model. Gipps model exhibits rather aggressive acceleration behavior. If used for the intersection throughput estimation, this model would lead to overly optimistic results. Helly model is convenient to analyze due to its linear nature, but its deceleration behavior in the presence of obstacles ahead is unrealistically abrupt. Showing the most realistic acceleration and deceleration behavior of the three models, IIDM is suited for ACC/CACC impact evaluation better than the other two. We discuss the influence of the maximal vehicle acceleration rate and presence of different portions of ACC/CACC vehicles on intersection throughput in the context of the three car following models. The analysis is done for two cases: (1) free road downstream of the intersection; and (2) red light at some distance downstream of the intersection. Finally, we introduce the platoon model and evaluate ACC and CACC with platooning in terms of travel time ad network throughput using SUMO simulation of the 4-mile stretch of Colorado Boulevard / Huntington Drive arterial with 13 signalized intersections in Arcadia, Southern California.
LGJun 15, 2020
FANOK: Knockoffs in Linear TimeArmin Askari, Quentin Rebjock, Alexandre d'Aspremont et al.
We describe a series of algorithms that efficiently implement Gaussian model-X knockoffs to control the false discovery rate on large scale feature selection problems. Identifying the knockoff distribution requires solving a large scale semidefinite program for which we derive several efficient methods. One handles generic covariance matrices, has a complexity scaling as $O(p^3)$ where $p$ is the ambient dimension, while another assumes a rank $k$ factor model on the covariance matrix to reduce this complexity bound to $O(pk^2)$. We also derive efficient procedures to both estimate factor models and sample knockoff covariates with complexity linear in the dimension. We test our methods on problems with $p$ as large as $500,000$.
LGAug 17, 2019
Implicit Deep LearningLaurent El Ghaoui, Fangda Gu, Bertrand Travacca et al.
Implicit deep learning prediction rules generalize the recursive rules of feedforward neural networks. Such rules are based on the solution of a fixed-point equation involving a single vector of hidden features, which is thus only implicitly defined. The implicit framework greatly simplifies the notation of deep learning, and opens up many new possibilities, in terms of novel architectures and algorithms, robustness analysis and design, interpretability, sparsity, and network architecture optimization.
LGMay 23, 2019
Naive Feature Selection: a Nearly Tight Convex Relaxation for Sparse Naive BayesArmin Askari, Alexandre d'Aspremont, Laurent El Ghaoui
Due to its linear complexity, naive Bayes classification remains an attractive supervised learning method, especially in very large-scale settings. We propose a sparse version of naive Bayes, which can be used for feature selection. This leads to a combinatorial maximum-likelihood problem, for which we provide an exact solution in the case of binary data, or a bound in the multinomial case. We prove that our convex relaxation bounds becomes tight as the marginal contribution of additional features decreases, using a priori duality gap bounds dervied from the Shapley-Folkman theorem. We show how to produce primal solutions satisfying these bounds. Both binary and multinomial sparse models are solvable in time almost linear in problem size, representing a very small extra relative cost compared to the classical naive Bayes. Numerical experiments on text data show that the naive Bayes feature selection method is as statistically effective as state-of-the-art feature selection methods such as recursive feature elimination, $l_1$-penalized logistic regression and LASSO, while being orders of magnitude faster.
LGNov 20, 2018
Fenchel Lifted Networks: A Lagrange Relaxation of Neural Network TrainingFangda Gu, Armin Askari, Laurent El Ghaoui
Despite the recent successes of deep neural networks, the corresponding training problem remains highly non-convex and difficult to optimize. Classes of models have been proposed that introduce greater structure to the objective function at the cost of lifting the dimension of the problem. However, these lifted methods sometimes perform poorly compared to traditional neural networks. In this paper, we introduce a new class of lifted models, Fenchel lifted networks, that enjoy the same benefits as previous lifted models, without suffering a degradation in performance over classical networks. Our model represents activation functions as equivalent biconvex constraints and uses Lagrange Multipliers to arrive at a rigorous lower bound of the traditional neural network training problem. This model is efficiently trained using block-coordinate descent and is parallelizable across data points and/or layers. We compare our model against standard fully connected and convolutional networks and show that we are able to match or beat their performance.
LGNov 6, 2018
Greedy Frank-Wolfe Algorithm for Exemplar SelectionGary Cheng, Armin Askari, Kannan Ramchandran et al.
In this paper, we consider the problem of selecting representatives from a data set for arbitrary supervised/unsupervised learning tasks. We identify a subset $S$ of a data set $A$ such that 1) the size of $S$ is much smaller than $A$ and 2) $S$ efficiently describes the entire data set, in a way formalized via convex optimization. In order to generate $|S| = k$ exemplars, our kernelizable algorithm, Frank-Wolfe Sparse Representation (FWSR), only needs to execute $\approx k$ iterations with a per-iteration cost that is quadratic in the size of $A$. This is in contrast to other state of the art methods which need to execute until convergence with each iteration costing an extra factor of $d$ (dimension of the data). Moreover, we also provide a proof of linear convergence for our method. We support our results with empirical experiments; we test our algorithm against current methods in three different experimental setups on four different data sets. FWSR outperforms other exemplar finding methods both in speed and accuracy in almost all scenarios.
MLJun 18, 2018
Kernel-based Outlier Detection using the Inverse Christoffel FunctionArmin Askari, Forest Yang, Laurent El Ghaoui
Outlier detection methods have become increasingly relevant in recent years due to increased security concerns and because of its vast application to different fields. Recently, Pauwels and Lasserre (2016) noticed that the sublevel sets of the inverse Christoffel function accurately depict the shape of a cloud of data using a sum-of-squares polynomial and can be used to perform outlier detection. In this work, we propose a kernelized variant of the inverse Christoffel function that makes it computationally tractable for data sets with a large number of features. We compare our approach to current methods on 15 different data sets and achieve the best average area under the precision recall curve (AUPRC) score, the best average rank and the lowest root mean square deviation.
LGMay 3, 2018
Lifted Neural NetworksArmin Askari, Geoffrey Negiar, Rajiv Sambharya et al.
We describe a novel family of models of multi- layer feedforward neural networks in which the activation functions are encoded via penalties in the training problem. Our approach is based on representing a non-decreasing activation function as the argmin of an appropriate convex optimiza- tion problem. The new framework allows for algo- rithms such as block-coordinate descent methods to be applied, in which each step is composed of a simple (no hidden layer) supervised learning problem that is parallelizable across data points and/or layers. Experiments indicate that the pro- posed models provide excellent initial guesses for weights for standard neural networks. In addi- tion, the model provides avenues for interesting extensions, such as robustness against noisy in- puts and optimizing over parameters in activation functions.