LGMar 26, 2020
Not All Features Are Equal: Discovering Essential Features for Preserving Prediction PrivacyFatemehsadat Mireshghallah, Mohammadkazem Taram, Ali Jalali et al.
When receiving machine learning services from the cloud, the provider does not need to receive all features; in fact, only a subset of the features are necessary for the target prediction task. Discerning this subset is the key problem of this work. We formulate this problem as a gradient-based perturbation maximization method that discovers this subset in the input feature space with respect to the functionality of the prediction model used by the provider. After identifying the subset, our framework, Cloak, suppresses the rest of the features using utility-preserving constant values that are discovered through a separate gradient-based optimization process. We show that Cloak does not necessarily require collaboration from the service provider beyond its normal service, and can be applied in scenarios where we only have black-box access to the service provider's model. We theoretically guarantee that Cloak's optimizations reduce the upper bound of the Mutual Information (MI) between the data and the sifted representations that are sent out. Experimental results show that Cloak reduces the mutual information between the input and the sifted representations by 85.01% with only a negligible reduction in utility (1.42%). In addition, we show that Cloak greatly diminishes adversaries' ability to learn and infer non-conducive features.
LGMay 14, 2013
Scalable Audience Reach Estimation in Real-time Online AdvertisingAli Jalali, Santanu Kolay, Peter Foldes et al.
Online advertising has been introduced as one of the most efficient methods of advertising throughout the recent years. Yet, advertisers are concerned about the efficiency of their online advertising campaigns and consequently, would like to restrict their ad impressions to certain websites and/or certain groups of audience. These restrictions, known as targeting criteria, limit the reachability for better performance. This trade-off between reachability and performance illustrates a need for a forecasting system that can quickly predict/estimate (with good accuracy) this trade-off. Designing such a system is challenging due to (a) the huge amount of data to process, and, (b) the need for fast and accurate estimates. In this paper, we propose a distributed fault tolerant system that can generate such estimates fast with good accuracy. The main idea is to keep a small representative sample in memory across multiple machines and formulate the forecasting problem as queries against the sample. The key challenge is to find the best strata across the past data, perform multivariate stratified sampling while ensuring fuzzy fall-back to cover the small minorities. Our results show a significant improvement over the uniform and simple stratified sampling strategies which are currently widely used in the industry.
GTMay 14, 2013
Real Time Bid Optimization with Smooth Budget Delivery in Online AdvertisingKuang-Chih Lee, Ali Jalali, Ali Dasdan
Today, billions of display ad impressions are purchased on a daily basis through a public auction hosted by real time bidding (RTB) exchanges. A decision has to be made for advertisers to submit a bid for each selected RTB ad request in milliseconds. Restricted by the budget, the goal is to buy a set of ad impressions to reach as many targeted users as possible. A desired action (conversion), advertiser specific, includes purchasing a product, filling out a form, signing up for emails, etc. In addition, advertisers typically prefer to spend their budget smoothly over the time in order to reach a wider range of audience accessible throughout a day and have a sustainable impact. However, since the conversions occur rarely and the occurrence feedback is normally delayed, it is very challenging to achieve both budget and performance goals at the same time. In this paper, we present an online approach to the smooth budget delivery while optimizing for the conversion performance. Our algorithm tries to select high quality impressions and adjust the bid price based on the prior performance distribution in an adaptive manner by distributing the budget optimally across time. Our experimental results from real advertising campaigns demonstrate the effectiveness of our proposed approach.
MLJun 7, 2012
A New Greedy Algorithm for Multiple Sparse RegressionAli Jalali, Sujay Sanghavi
This paper proposes a new algorithm for multiple sparse regression in high dimensions, where the task is to estimate the support and values of several (typically related) sparse vectors from a few noisy linear measurements. Our algorithm is a "forward-backward" greedy procedure that -- uniquely -- operates on two distinct classes of objects. In particular, we organize our target sparse vectors as a matrix; our algorithm involves iterative addition and removal of both (a) individual elements, and (b) entire rows (corresponding to shared features), of the matrix. Analytically, we establish that our algorithm manages to recover the supports (exactly) and values (approximately) of the sparse vectors, under assumptions similar to existing approaches based on convex optimization. However, our algorithm has a much smaller computational complexity. Perhaps most interestingly, it is seen empirically to require visibly fewer samples. Ours represents the first attempt to extend greedy algorithms to the class of models that can only/best be represented by a combination of component structural assumptions (sparse and group-sparse, in our case).
LGMar 30, 2012
A Lipschitz Exploration-Exploitation Scheme for Bayesian OptimizationAli Jalali, Javad Azimi, Xiaoli Fern et al.
The problem of optimizing unknown costly-to-evaluate functions has been studied for a long time in the context of Bayesian Optimization. Algorithms in this field aim to find the optimizer of the function by asking only a few function evaluations at locations carefully selected based on a posterior model. In this paper, we assume the unknown function is Lipschitz continuous. Leveraging the Lipschitz property, we propose an algorithm with a distinct exploration phase followed by an exploitation phase. The exploration phase aims to select samples that shrink the search space as much as possible. The exploitation phase then focuses on the reduced search space and selects samples closest to the optimizer. Considering the Expected Improvement (EI) as a baseline, we empirically show that the proposed algorithm significantly outperforms EI.
LGFeb 25, 2012
Clustering using Max-norm Constrained OptimizationAli Jalali, Nathan Srebro
We suggest using the max-norm as a convex surrogate constraint for clustering. We show how this yields a better exact cluster recovery guarantee than previously suggested nuclear-norm relaxation, and study the effectiveness of our method, and other related convex relaxations, compared to other clustering approaches.
AIFeb 25, 2012
Hybrid Batch Bayesian OptimizationJavad Azimi, Ali Jalali, Xiaoli Fern
Bayesian Optimization aims at optimizing an unknown non-convex/concave function that is costly to evaluate. We are interested in application scenarios where concurrent function evaluations are possible. Under such a setting, BO could choose to either sequentially evaluate the function, one input at a time and wait for the output of the function before making the next selection, or evaluate the function at a batch of multiple inputs at once. These two different settings are commonly referred to as the sequential and batch settings of Bayesian Optimization. In general, the sequential setting leads to better optimization performance as each function evaluation is selected with more information, whereas the batch setting has an advantage in terms of the total experimental time (the number of iterations). In this work, our goal is to combine the strength of both settings. Specifically, we systematically analyze Bayesian optimization using Gaussian process as the posterior estimator and provide a hybrid algorithm that, based on the current state, dynamically switches between a sequential policy and a batch policy with variable batch sizes. We provide theoretical justification for our algorithm and present experimental results on eight benchmark BO problems. The results show that our method achieves substantial speedup (up to %78) compared to a pure sequential policy, without suffering any significant performance loss.