Mahnoosh Alizadeh

LG
h-index32
33papers
448citations
Novelty51%
AI Score55

33 Papers

SYFeb 16, 2013
Real-Time Power Balancing via Decentralized Coordinated Home Energy Scheduling

Tsung-Hui Chang, Mahnoosh Alizadeh, Anna Scaglione

It is anticipated that an uncoordinated operation of individual home energy management (HEM) systems in a neighborhood would have a rebound effect on the aggregate demand profile. To address this issue, this paper proposes a coordinated home energy management (CoHEM) architecture in which distributed HEM units collaborate with each other in order to keep the demand and supply balanced in their neighborhood. Assuming the energy requests by customers are random in time, we formulate the proposed CoHEM design as a multi-stage stochastic optimization problem. We propose novel models to describe the deferrable appliance load (e.g., Plug-in (Hybrid) Electric Vehicles (PHEV)), and apply approximation and decomposition techniques to handle the considered design problem in a decentralized fashion. The developed decentralized CoHEM algorithm allow the customers to locally compute their scheduling solutions using domestic user information and with message exchange between their neighbors only. Extensive simulation results demonstrate that the proposed CoHEM architecture can effectively improve real-time power balancing. Extensions to joint power procurement and real-time CoHEM scheduling are also presented.

SYApr 10, 2012
Coordinated Home Energy Management for Real-Time Power Balancing

Tsung-Hui Chang, Mahnoosh Alizadeh, Anna Scaglione

This paper proposes a coordinated home energy management system (HEMS) architecture where the distributed residential units cooperate with each other to achieve real-time power balancing. The economic benefits for the retailer and incentives for the customers to participate in the proposed coordinated HEMS program are given. We formulate the coordinated HEMS design problem as a dynamic programming (DP) and use approximate DP approaches to efficiently handle the design problem. A distributed implementation algorithm based on the convex optimization based dual decomposition technique is also presented. Our focus in the current paper is on the deferrable appliances, such as Plug-in (Hybrid) Electric Vehicles (PHEV), in view of their higher impact on the grid stability. Simulation results shows that the proposed coordinated HEMS architecture can efficiently improve the real-time power balancing.

SYAug 26, 2019
Pricing and Routing Mechanisms for Differentiated Services in an Electric Vehicle Public Charging Station Network

Ahmadreza Moradipari, Mahnoosh Alizadeh

We consider a Charging Network Operator (CNO) that owns a network of Electric Vehicle (EV) public charging stations and wishes to offer a menu of differentiated service options for access to its stations. This involves designing optimal pricing and routing schemes for the setting where users cannot directly choose which station they use. Instead, they choose their priority level and energy request amount from the differentiated service menu, and then the CNO directly assigns them to a station on their path. This allows higher priority users to experience lower wait times at stations, and allows the CNO to directly manage demand, exerting a higher level of control that can be used to manage the effect of EV on the grid and control station wait times. We consider the scenarios where the CNO is a social welfare-maximizing or a profit-maximizing entity, and in both cases, design pricing-routing policies that ensure users reveal their true parameters to the CNO.

LGMay 12, 2022
Collaborative Multi-agent Stochastic Linear Bandits

Ahmadreza Moradipari, Mohammad Ghavamzadeh, Mahnoosh Alizadeh

We study a collaborative multi-agent stochastic linear bandit setting, where $N$ agents that form a network communicate locally to minimize their overall regret. In this setting, each agent has its own linear bandit problem (its own reward parameter) and the goal is to select the best global action w.r.t. the average of their reward parameters. At each round, each agent proposes an action, and one action is randomly selected and played as the network action. All the agents observe the corresponding rewards of the played actions and use an accelerated consensus procedure to compute an estimate of the average of the rewards obtained by all the agents. We propose a distributed upper confidence bound (UCB) algorithm and prove a high probability bound on its $T$-round regret in which we include a linear growth of regret associated with each communication round. Our regret bound is of order $\mathcal{O}\Big(\sqrt{\frac{T}{N \log(1/|λ_2|)}}\cdot (\log T)^2\Big)$, where $λ_2$ is the second largest (in absolute value) eigenvalue of the communication matrix.

LGMay 12, 2022
Multi-Environment Meta-Learning in Stochastic Linear Bandits

Ahmadreza Moradipari, Mohammad Ghavamzadeh, Taha Rajabzadeh et al.

In this work we investigate meta-learning (or learning-to-learn) approaches in multi-task linear stochastic bandit problems that can originate from multiple environments. Inspired by the work of [1] on meta-learning in a sequence of linear bandit problems whose parameters are sampled from a single distribution (i.e., a single environment), here we consider the feasibility of meta-learning when task parameters are drawn from a mixture distribution instead. For this problem, we propose a regularized version of the OFUL algorithm that, when trained on tasks with labeled environments, achieves low regret on a new task without requiring knowledge of the environment from which the new task originates. Specifically, our regret bound for the new algorithm captures the effect of environment misclassification and highlights the benefits over learning each task separately or meta-learning without recognition of the distinct mixture components.

65.4LGApr 19
REALM: Reliable Expertise-Aware Language Model Fine-Tuning from Noisy Annotations

Sajjad Ghiasvand, Mark Beliaev, Mahnoosh Alizadeh et al.

Supervised fine-tuning of large language models relies on human-annotated data, yet annotation pipelines routinely involve multiple crowdworkers of heterogeneous expertise. Standard practice aggregates labels via majority vote or simple averaging, discarding annotator identity and causing the model to absorb the errors of unreliable annotators directly into its parameters. We propose REALM, a method that jointly learns the model parameters and a scalar expertise value for each annotator entirely unsupervised, requiring no supervision beyond annotator identity. The key idea is to model each observed label as a mixture between the model's prediction and a uniform random guess, weighted by the annotator's learned expertise. We extend REALM to a multi-task setting via a learned expertise matrix that captures per-annotator reliability across tasks. We evaluate on five question answering benchmarks, fine-tuning three sizes of Flan-T5 under simulated noisy annotations. The proposed algorithm consistently outperforms the naive noisy SFT in the large majority of single- and multi-task settings, across datasets, model sizes, and noise types, with accuracy improvements of up to $50\%$ in the most adversarial regime and gains that grow with model capacity.

LGAug 29, 2023
Directional Optimism for Safe Linear Bandits

Spencer Hutchinson, Berkay Turan, Mahnoosh Alizadeh

The safe linear bandit problem is a version of the classical stochastic linear bandit problem where the learner's actions must satisfy an uncertain constraint at all rounds. Due its applicability to many real-world settings, this problem has received considerable attention in recent years. By leveraging a novel approach that we call directional optimism, we find that it is possible to achieve improved regret guarantees for both well-separated problem instances and action sets that are finite star convex sets. Furthermore, we propose a novel algorithm for this setting that improves on existing algorithms in terms of empirical performance, while enjoying matching regret guarantees. Lastly, we introduce a generalization of the safe linear bandit setting where the constraints are convex and adapt our algorithms and analyses to this setting by leveraging a novel convex-analysis based approach.

SYSep 16, 2020
Mobility-Aware Electric Vehicle Fast Charging Load Models with Geographical Price Variations

Ahmadreza Moradipari, Nathaniel Tucker, Mahnoosh Alizadeh

We study the traffic patterns as well as the charging patterns of a population of cost-minimizing EV owners traveling and charging within a transportation network equipped with fast charging stations (FCSs). Specifically, we study how the charging network operator (CNO) can influence where EV users charge in order to optimize the utilization of fast charging stations. These charging decisions of private EV owners affect aggregate congestion at stations (i.e., waiting time) as well as the aggregate EV charging load across the network. In this work, we capture the resulting equilibrium wait times and electricity load through a so-called \textit{traffic and charge assignment problem} (TCAP) in a fast charging station network. Our formulation allows us to: 1) Study the expected station wait times as well as the probability distribution of aggregate charging load of EVs in a FCS network in a mobility-aware fashion (an aspect unique to our work), while accounting for heterogeneities in users' travel patterns, energy demands, and geographically variant electricity prices. 2) Analytically characterize the special threshold-based structure that determines how EV owners choose where to charge their vehicle at equilibrium, in response to the FCS's charging price structure, users' energy demands, and users' mobility patterns. 3) Provide a convex optimization problem formulation to identify the network's unique equilibrium. Furthermore, we illustrate how to induce a socially optimal charging behavior by deriving the socially optimal plug-in fees and electricity prices at the charging stations.

5.5GTMar 26
Resource Allocation in Strategic Adversarial Interactions: Colonel Blotto Games and Their Applications in Control Systems

Keith Paarporn, Rahul Chandan, Mahnoosh Alizadeh et al.

Resource allocation under strategic adversarial constraints represents a fundamental challenge in control systems, from cybersecurity defense to infrastructure protection. While game-theoretic frameworks have long informed such problems, Colonel Blotto games -- despite their direct relevance to allocation decisions -- remain underutilized and underappreciated in the controls community compared to other game-theoretic models like the Prisoner's Dilemma. The disparity stems largely from analytical complexity: Colonel Blotto games typically require characterizing intricate mixed-strategy equilibria that resist the clean, closed-form solutions control theorists prefer. Yet as Golman and Page observe, this very complexity ``makes Blotto all the more compelling in its interpretations.'' The goal of this expository article is to showcase the power and versatility of Colonel Blotto game frameworks for the controls community, demonstrating how allocation problems across cybersecurity, network defense, and multi-agent systems can be modeled within this unified theoretical structure. We survey recent analytical and computational breakthroughs, highlight diverse applications, and examine extensions addressing incomplete information, network effects, and multi-stage decision-making -- illustrating how Colonel Blotto games provide both practical tools and fundamental insights for strategic resource allocation in adversarial environments.

CVFeb 24
MMLoP: Multi-Modal Low-Rank Prompting for Efficient Vision-Language Adaptation

Sajjad Ghiasvand, Haniyeh Ehsani Oskouie, Mahnoosh Alizadeh et al.

Prompt learning has become a dominant paradigm for adapting vision-language models (VLMs) such as CLIP to downstream tasks without modifying pretrained weights. While extending prompts to both vision and text encoders across multiple transformer layers significantly boosts performance, it dramatically increases the number of trainable parameters, with state-of-the-art methods requiring millions of parameters and abandoning the parameter efficiency that makes prompt tuning attractive. In this work, we propose \textbf{MMLoP} (\textbf{M}ulti-\textbf{M}odal \textbf{Lo}w-Rank \textbf{P}rompting), a framework that achieves deep multi-modal prompting with only \textbf{11.5K trainable parameters}, comparable to early text-only methods like CoOp. MMLoP parameterizes vision and text prompts at each transformer layer through a low-rank factorization, which serves as an implicit regularizer against overfitting on few-shot training data. To further close the accuracy gap with state-of-the-art methods, we introduce three complementary components: a self-regulating consistency loss that anchors prompted representations to frozen zero-shot CLIP features at both the feature and logit levels, a uniform drift correction that removes the global embedding shift induced by prompt tuning to preserve class-discriminative structure, and a shared up-projection that couples vision and text prompts through a common low-rank factor to enforce cross-modal alignment. Extensive experiments across three benchmarks and 11 diverse datasets demonstrate that MMLoP achieves a highly favorable accuracy-efficiency tradeoff, outperforming the majority of existing methods including those with orders of magnitude more parameters, while achieving a harmonic mean of 79.70\% on base-to-novel generalization.

46.2OCApr 20
Steady-state Based Approach to Online Non-stochastic Control

Vijeth Hebbar, Spencer Hutchinson, Mahnoosh Alizadeh et al.

We study the problem of online non-stochastic control (ONC), which is the control of a linear system under adversarial disturbances and adversarial cost functions, with the aim of minimizing the total cost incurred. A recent line of literature in ONC develops algorithms that enjoy sublinear regret with respect to a benchmark based on the set of steady-states that are attainable by a constant input. In this work, we extend this research direction by giving an algorithm that enjoys $\mathcal{O}(\sqrt{T})$ regret with respect to a richer benchmark set, namely the set of steady-states attainable under an \emph{affine controller}. Since this benchmark substantially broadens the comparison class, it provides significantly stronger performance guarantees. Our proposed algorithm combines a Follow-The-Perturbed-Leader-style online non-convex optimization approach with a batching method that maintains stability despite changing policies. Although our proposed algorithm requires solving non-convex subproblems, we show that an approximate solution to this subproblem is sufficient to ensure $\mathcal{O}(\sqrt{T})$ regret. Furthermore, numerical experiments show that our algorithm enjoys lower total cost and similar computation to existing methods in certain settings.

62.0OCApr 24
Near-Optimal Regret for the Safe Learning-based Control of the Constrained Linear Quadratic Regulator

Spencer Hutchinson, Nanfei Jiang, Mahnoosh Alizadeh

We study the problem of adaptive control of the stochastic linear quadratic regulator (LQR) with constraints that must be satisfied at every time step. Prior work on the multidimensional problem has shown $\tilde{O}(T^{2/3})$ regret and satisfaction of robust constraints, leaving open the question of whether $\tilde{O}(\sqrt{T})$ regret can be attained in the constrained LQR setting. We contribute to this problem by showing $\tilde{O}(\sqrt{T})$ regret and satisfaction of chance constraints. This type of constraints allow us to handle unbounded noise and also enable analytical techniques not directly applicable to robust constraints. Our proposed algorithm for this problem uses an SDP to select an optimistic policy, and then "scales back" this policy until it is verifiably-safe. Our theoretical analysis establishes regret and constraint guarantees via a key lemma that bounds the system covariance in terms of the chosen policy. This covariance-based analysis is in contrast with the cost-to-go based analysis that is typically used in adaptive LQR.

LGJul 16, 2024
Safe Online Convex Optimization with Multi-Point Feedback

Spencer Hutchinson, Mahnoosh Alizadeh

Motivated by the stringent safety requirements that are often present in real-world applications, we study a safe online convex optimization setting where the player needs to simultaneously achieve sublinear regret and zero constraint violation while only using zero-order information. In particular, we consider a multi-point feedback setting, where the player chooses $d + 1$ points in each round (where $d$ is the problem dimension) and then receives the value of the constraint function and cost function at each of these points. To address this problem, we propose an algorithm that leverages forward-difference gradient estimation as well as optimistic and pessimistic action sets to achieve $\mathcal{O}(d \sqrt{T})$ regret and zero constraint violation under the assumption that the constraint function is smooth and strongly convex. We then perform a numerical study to investigate the impacts of the unknown constraint and zero-order feedback on empirical performance.

LGJan 26, 2025
Decentralized Low-Rank Fine-Tuning of Large Language Models

Sajjad Ghiasvand, Mahnoosh Alizadeh, Ramtin Pedarsani

While parameter-efficient fine-tuning (PEFT) techniques like Low-Rank Adaptation (LoRA) offer computationally efficient adaptations of Large Language Models (LLMs), their practical deployment often assumes centralized data and training environments. However, real-world scenarios frequently involve distributed, privacy-sensitive datasets that require decentralized solutions. Federated learning (FL) addresses data privacy by coordinating model updates across clients, but it is typically based on centralized aggregation through a parameter server, which can introduce bottlenecks and communication constraints. Decentralized learning, in contrast, eliminates this dependency by enabling direct collaboration between clients, improving scalability and efficiency in distributed environments. Despite its advantages, decentralized LLM fine-tuning remains underexplored. In this work, we propose Dec-LoRA, a decentralized fine-tuning algorithm for LLMs based on LoRA. Through extensive experiments on BERT and LLaMA-2 models, we demonstrate that Dec-LoRA achieves performance comparable to centralized LoRA under various conditions, including data heterogeneity and quantization constraints. Additionally, we provide a rigorous theoretical guarantee proving the convergence of our algorithm to a stationary point for non-convex and smooth loss functions. These findings highlight the potential of Dec-LoRA for scalable LLM fine-tuning in decentralized environments.

LGMay 21, 2025
Few-Shot Adversarial Low-Rank Fine-Tuning of Vision-Language Models

Sajjad Ghiasvand, Haniyeh Ehsani Oskouie, Mahnoosh Alizadeh et al.

Vision-Language Models (VLMs) such as CLIP have shown remarkable performance in cross-modal tasks through large-scale contrastive pre-training. To adapt these large transformer-based models efficiently for downstream tasks, Parameter-Efficient Fine-Tuning (PEFT) techniques like (Low-Rank Adaptation) LoRA have emerged as scalable alternatives to full fine-tuning, especially in few-shot scenarios. However, like traditional deep neural networks, VLMs are highly vulnerable to adversarial attacks, where imperceptible perturbations can significantly degrade model performance. Adversarial training remains the most effective strategy for improving model robustness in PEFT. In this work, we propose AdvCLIP-LoRA, to our knowledge the first method designed to enhance the adversarial robustness of CLIP models fine-tuned with LoRA in few-shot settings. Our method formulates training as a minimax optimization over low-rank adapters and adversarial perturbations, enabling robust adaptation with a small trainable footprint. Across eight datasets and two backbones (ViT-B/16 and ViT-B/32), AdvCLIP-LoRA achieves state-of-the-art performance in few-shot classification, adversarial base-to-new generalization, and cross-dataset transfer, delivering higher adversarial robustness than prompt tuning baselines without sacrificing much clean accuracy. These findings highlight AdvCLIP-LoRA as a practical approach for robust adaptation of VLMs in resource-constrained settings.

LGMay 2, 2024
Robust Decentralized Learning with Local Updates and Gradient Tracking

Sajjad Ghiasvand, Amirhossein Reisizadeh, Mahnoosh Alizadeh et al.

As distributed learning applications such as Federated Learning, the Internet of Things (IoT), and Edge Computing grow, it is critical to address the shortcomings of such technologies from a theoretical perspective. As an abstraction, we consider decentralized learning over a network of communicating clients or nodes and tackle two major challenges: data heterogeneity and adversarial robustness. We propose a decentralized minimax optimization method that employs two important modules: local updates and gradient tracking. Minimax optimization is the key tool to enable adversarial training for ensuring robustness. Having local updates is essential in Federated Learning (FL) applications to mitigate the communication bottleneck, and utilizing gradient tracking is essential to proving convergence in the case of data heterogeneity. We analyze the performance of the proposed algorithm, Dec-FedTrack, in the case of nonconvex-strongly concave minimax optimization, and prove that it converges a stationary point. We also conduct numerical experiments to support our theoretical findings.

LGOct 16, 2024
Communication-Efficient and Tensorized Federated Fine-Tuning of Large Language Models

Sajjad Ghiasvand, Yifan Yang, Zhiyu Xue et al.

Parameter-efficient fine-tuning (PEFT) methods typically assume that Large Language Models (LLMs) are trained on data from a single device or client. However, real-world scenarios often require fine-tuning these models on private data distributed across multiple devices. Federated Learning (FL) offers an appealing solution by preserving user privacy, as sensitive data remains on local devices during training. Nonetheless, integrating PEFT methods into FL introduces two main challenges: communication overhead and data heterogeneity. In this paper, we introduce FedTT and FedTT+, methods for adapting LLMs by integrating tensorized adapters into client-side models' encoder/decoder blocks. FedTT is versatile and can be applied to both cross-silo FL and large-scale cross-device FL. FedTT+, an extension of FedTT tailored for cross-silo FL, enhances robustness against data heterogeneity by adaptively freezing portions of tensor factors, further reducing the number of trainable parameters. Experiments on BERT and LLaMA models demonstrate that our proposed methods successfully address data heterogeneity challenges and perform on par or even better than existing federated PEFT approaches while achieving up to 10$\times$ reduction in communication cost.

LGMar 9, 2024
Optimistic Safety for Online Convex Optimization with Unknown Linear Constraints

Spencer Hutchinson, Tianyi Chen, Mahnoosh Alizadeh

We study the problem of online convex optimization (OCO) under unknown linear constraints that are either static, or stochastically time-varying. For this problem, we introduce an algorithm that we term Optimistically Safe OCO (OSOCO) and show that it enjoys $\tilde{O}(\sqrt{T})$ regret and no constraint violation. In the case of static linear constraints, this improves on the previous best known $\tilde{O}(T^{2/3})$ regret under the same assumptions. In the case of stochastic time-varying constraints, our work supplements existing results that show $O(\sqrt{T})$ regret and $O(\sqrt{T})$ cumulative violation under more general convex constraints and a different set of assumptions. In addition to our theoretical guarantees, we also give numerical results that further validate the effectiveness of our approach.

LGAug 23, 2025
Stochastic Gradient Descent with Strategic Querying

Nanfei Jiang, Hoi-To Wai, Mahnoosh Alizadeh

This paper considers a finite-sum optimization problem under first-order queries and investigates the benefits of strategic querying on stochastic gradient-based methods compared to uniform querying strategy. We first introduce Oracle Gradient Querying (OGQ), an idealized algorithm that selects one user's gradient yielding the largest possible expected improvement (EI) at each step. However, OGQ assumes oracle access to the gradients of all users to make such a selection, which is impractical in real-world scenarios. To address this limitation, we propose Strategic Gradient Querying (SGQ), a practical algorithm that has better transient-state performance than SGD while making only one query per iteration. For smooth objective functions satisfying the Polyak-Lojasiewicz condition, we show that under the assumption of EI heterogeneity, OGQ enhances transient-state performance and reduces steady-state variance, while SGQ improves transient-state performance over SGD. Our numerical experiments validate our theoretical findings.

CVJul 7, 2025
pFedMMA: Personalized Federated Fine-Tuning with Multi-Modal Adapter for Vision-Language Models

Sajjad Ghiasvand, Mahnoosh Alizadeh, Ramtin Pedarsani

Vision-Language Models (VLMs) like CLIP have demonstrated remarkable generalization in zero- and few-shot settings, but adapting them efficiently to decentralized, heterogeneous data remains a challenge. While prompt tuning has emerged as a popular parameter-efficient approach in personalized federated learning, existing methods often sacrifice generalization in favor of personalization, struggling particularly on unseen classes or domains. In this work, we propose pFedMMA, the first personalized federated learning framework that leverages multi-modal adapters for vision-language tasks. Each adapter contains modality-specific up- and down-projection layers alongside a globally shared projection that aligns cross-modal features. Our optimization strategy allows clients to locally adapt to personalized data distributions while collaboratively training the shared projection to improve global generalization. This design is also communication-efficient, as only the shared component is exchanged during communication rounds. Through extensive experiments across eleven datasets, including domain- and label-shift scenarios, we show that pFedMMA achieves state-of-the-art trade-offs between personalization and generalization, outperforming recent federated prompt tuning methods.

OCApr 23, 2025
The Safety-Privacy Tradeoff in Linear Bandits

Arghavan Zibaie, Spencer Hutchinson, Ramtin Pedarsani et al.

We consider a collection of linear stochastic bandit problems, each modeling the random response of different agents to proposed interventions, coupled together by a global safety constraint. We assume a central coordinator must choose actions to play on each bandit with the objective of regret minimization, while also ensuring that the expected response of all agents satisfies the global safety constraints at each round, in spite of uncertainty about the bandits' parameters. The agents consider their observed responses to be private and in order to protect their sensitive information, the data sharing with the central coordinator is performed under local differential privacy (LDP). However, providing higher level of privacy to different agents would have consequences in terms of safety and regret. We formalize these tradeoffs by building on the notion of the sharpness of the safety set - a measure of how the geometric properties of the safe set affects the growth of regret - and propose a unilaterally unimprovable vector of privacy levels for different agents given a maximum regret budget.

LGFeb 18, 2025
Constrained Online Convex Optimization with Polyak Feasibility Steps

Spencer Hutchinson, Mahnoosh Alizadeh

In this work, we study online convex optimization with a fixed constraint function $g : \mathbb{R}^d \rightarrow \mathbb{R}$. Prior work on this problem has shown $O(\sqrt{T})$ regret and cumulative constraint satisfaction $\sum_{t=1}^{T} g(x_t) \leq 0$, while only accessing the constraint value and subgradient at the played actions $g(x_t), \partial g(x_t)$. Using the same constraint information, we show a stronger guarantee of anytime constraint satisfaction $g(x_t) \leq 0 \ \forall t \in [T]$, and matching $O(\sqrt{T})$ regret guarantees. These contributions are thanks to our approach of using Polyak feasibility steps to ensure constraint satisfaction, without sacrificing regret. Specifically, after each step of online gradient descent, our algorithm applies a subgradient descent step on the constraint function where the step-size is chosen according to the celebrated Polyak step-size. We further validate this approach with numerical experiments.

LGMay 1, 2023
The Impact of the Geometric Properties of the Constraint Set in Safe Optimization with Bandit Feedback

Spencer Hutchinson, Berkay Turan, Mahnoosh Alizadeh

We consider a safe optimization problem with bandit feedback in which an agent sequentially chooses actions and observes responses from the environment, with the goal of maximizing an arbitrary function of the response while respecting stage-wise constraints. We propose an algorithm for this problem, and study how the geometric properties of the constraint set impact the regret of the algorithm. In order to do so, we introduce the notion of the sharpness of a particular constraint set, which characterizes the difficulty of performing learning within the constraint set in an uncertain setting. This concept of sharpness allows us to identify the class of constraint sets for which the proposed algorithm is guaranteed to enjoy sublinear regret. Simulation results for this algorithm support the sublinear regret bound and provide empirical evidence that the sharpness of the constraint set impacts the performance of the algorithm.

OCJun 28, 2021
Robust Distributed Optimization With Randomly Corrupted Gradients

Berkay Turan, Cesar A. Uribe, Hoi-To Wai et al.

In this paper, we propose a first-order distributed optimization algorithm that is provably robust to Byzantine failures-arbitrary and potentially adversarial behavior, where all the participating agents are prone to failure. We model each agent's state over time as a two-state Markov chain that indicates Byzantine or trustworthy behaviors at different time instants. We set no restrictions on the maximum number of Byzantine agents at any given time. We design our method based on three layers of defense: 1) temporal robust aggregation, 2) spatial robust aggregation, and 3) gradient normalization. We study two settings for stochastic optimization, namely Sample Average Approximation and Stochastic Approximation. We provide convergence guarantees of our method for strongly convex and smooth non-convex cost functions.

LGJun 9, 2021
Feature and Parameter Selection in Stochastic Linear Bandits

Ahmadreza Moradipari, Berkay Turan, Yasin Abbasi-Yadkori et al.

We study two model selection settings in stochastic linear bandits (LB). In the first setting, which we refer to as feature selection, the expected reward of the LB problem is in the linear span of at least one of $M$ feature maps (models). In the second setting, the reward parameter of the LB problem is arbitrarily selected from $M$ models represented as (possibly) overlapping balls in $\mathbb R^d$. However, the agent only has access to misspecified models, i.e.,~estimates of the centers and radii of the balls. We refer to this setting as parameter selection. For each setting, we develop and analyze a computationally efficient algorithm that is based on a reduction from bandits to full-information problems. This allows us to obtain regret bounds that are not worse (up to a $\sqrt{\log M}$ factor) than the case where the true model is known. This is the best-reported dependence on the number of models $M$ in these settings. Finally, we empirically show the effectiveness of our algorithms using synthetic and real-world experiments.

LGSep 30, 2020
Stage-wise Conservative Linear Bandits

Ahmadreza Moradipari, Christos Thrampoulidis, Mahnoosh Alizadeh

We study stage-wise conservative linear stochastic bandits: an instance of bandit optimization, which accounts for (unknown) safety constraints that appear in applications such as online advertising and medical trials. At each stage, the learner must choose actions that not only maximize cumulative reward across the entire time horizon but further satisfy a linear baseline constraint that takes the form of a lower bound on the instantaneous reward. For this problem, we present two novel algorithms, stage-wise conservative linear Thompson Sampling (SCLTS) and stage-wise conservative linear UCB (SCLUCB), that respect the baseline constraints and enjoy probabilistic regret bounds of order O(\sqrt{T} \log^{3/2}T) and O(\sqrt{T} \log T), respectively. Notably, the proposed algorithms can be adjusted with only minor modifications to tackle different problem variations, such as constraints with bandit-feedback, or an unknown sequence of baseline actions. We discuss these and other improvements over the state-of-the-art. For instance, compared to existing solutions, we show that SCLTS plays the (non-optimal) baseline action at most O(\log{T}) times (compared to O(\sqrt{T})). Finally, we make connections to another studied form of safety constraints that takes the form of an upper bound on the instantaneous reward. While this incurs additional complexity to the learning process as the optimal action is not guaranteed to belong to the safe set at each round, we show that SCLUCB can properly adjust in this setting via a simple modification.

LGMay 5, 2020
Regret Bounds for Safe Gaussian Process Bandit Optimization

Sanae Amani, Mahnoosh Alizadeh, Christos Thrampoulidis

Many applications require a learner to make sequential decisions given uncertainty regarding both the system's payoff function and safety constraints. In safety-critical systems, it is paramount that the learner's actions do not violate the safety constraints at any stage of the learning process. In this paper, we study a stochastic bandit optimization problem where the unknown payoff and constraint functions are sampled from Gaussian Processes (GPs) first considered in [Srinivas et al., 2010]. We develop a safe variant of GP-UCB called SGP-UCB, with necessary modifications to respect safety constraints at every round. The algorithm has two distinct phases. The first phase seeks to estimate the set of safe actions in the decision set, while the second phase follows the GP-UCB decision rule. Our main contribution is to derive the first sub-linear regret bounds for this problem. We numerically compare SGP-UCB against existing safe Bayesian GP optimization algorithms.

LGNov 6, 2019
Safe Linear Thompson Sampling with Side Information

Ahmadreza Moradipari, Sanae Amani, Mahnoosh Alizadeh et al.

The design and performance analysis of bandit algorithms in the presence of stage-wise safety or reliability constraints has recently garnered significant interest. In this work, we consider the linear stochastic bandit problem under additional \textit{linear safety constraints} that need to be satisfied at each round. We provide a new safe algorithm based on linear Thompson Sampling (TS) for this problem and show a frequentist regret of order $\mathcal{O} (d^{3/2}\log^{1/2}d \cdot T^{1/2}\log^{3/2}T)$, which remarkably matches the results provided by (Abeille et al., 2017) for the standard linear TS algorithm in the absence of safety constraints. We compare the performance of our algorithm with UCB-based safe algorithms and highlight how the inherently randomized nature of TS leads to a superior performance in expanding the set of safe actions the algorithm has access to at each round.

LGAug 16, 2019
Linear Stochastic Bandits Under Safety Constraints

Sanae Amani, Mahnoosh Alizadeh, Christos Thrampoulidis

Bandit algorithms have various application in safety-critical systems, where it is important to respect the system constraints that rely on the bandit's unknown parameters at every round. In this paper, we formulate a linear stochastic multi-armed bandit problem with safety constraints that depend (linearly) on an unknown parameter vector. As such, the learner is unable to identify all safe actions and must act conservatively in ensuring that her actions satisfy the safety constraint at all rounds (at least with high probability). For these bandits, we propose a new UCB-based algorithm called Safe-LUCB, which includes necessary modifications to respect safety constraints. The algorithm has two phases. During the pure exploration phase the learner chooses her actions at random from a restricted set of safe actions with the goal of learning a good approximation of the entire unknown safe set. Once this goal is achieved, the algorithm begins a safe exploration-exploitation phase where the learner gradually expands their estimate of the set of safe actions while controlling the growth of regret. We provide a general regret bound for the algorithm, as well as a problem dependent bound that is connected to the location of the optimal action within the safe set. We then propose a modified heuristic that exploits our problem dependent analysis to improve the regret.

MAMay 15, 2019
An Online Pricing Mechanism for Electric Vehicle Parking Assignment and Charge Scheduling

Nathaniel Tucker, Bryce Ferguson, Mahnoosh Alizadeh

In this paper, we design a pricing framework for online electric vehicle (EV) parking assignment and charge scheduling. Here, users with electric vehicles want to park and charge at electric-vehicle-supply-equipment (EVSEs) at different locations and arrive/depart throughout the day. The goal is to assign and schedule users to the available EVSEs while maximizing user utility and minimizing operational costs. Our formulation can accommodate multiple locations, limited resources, operational costs, as well as variable arrival patterns. With this formulation, the parking facility management can optimize for behind-the-meter solar integration and reduce costs due to procuring electricity from the grid. We use an online pricing mechanism to approximate the EVSE reservation problem's solution and we analyze the performance compared to the offline solution. Our numerical simulation validates the performance of the EVSE reservation system in a downtown area with multiple parking locations equipped with EVSEs.

SYMay 15, 2019
An Online Admission Control Mechanism for Electric Vehicles at Public Parking Infrastructures

Nathaniel Tucker, Mahnoosh Alizadeh

We study an online reservation system that allows electric vehicles (EVs) to park and charge at parking facilities equipped with electric vehicle supply equipment (EVSEs). We consider the case where EVs arrive in an online fashion and the facility coordinator must immediately make an admission or rejection decision as well as assign a specific irrevocable parking spot to each admitted EV. By means of strategic user admittance and smart charging, the objective of the facility coordinator is to maximize total user utility minus the operational costs of the facilities. We discuss an online pricing mechanism based on primal-dual methods for combinatorial auctions that functions as both an admission controller and a distributor of the facilities' limited charging resources. We analyze the online pricing mechanism's performance compared to the optimal offline solution and provide numerical results that validate the mechanism's performance for various test cases.

SYSep 14, 2017
On the interaction between Autonomous Mobility-on-Demand systems and the power network: models and coordination algorithms

Federico Rossi, Ramon Iglesias, Mahnoosh Alizadeh et al.

We study the interaction between a fleet of electric, self-driving vehicles servicing on-demand transportation requests (referred to as Autonomous Mobility-on-Demand, or AMoD, system) and the electric power network. We propose a model that captures the coupling between the two systems stemming from the vehicles' charging requirements and captures time-varying customer demand and power generation costs, road congestion, battery depreciation, and power transmission and distribution constraints. We then leverage the model to jointly optimize the operation of both systems. We devise an algorithmic procedure to losslessly reduce the problem size by bundling customer requests, allowing it to be efficiently solved by off-the-shelf linear programming solvers. Next, we show that the socially optimal solution to the joint problem can be enforced as a general equilibrium, and we provide a dual decomposition algorithm that allows self-interested agents to compute the market clearing prices without sharing private information. We assess the performance of the mode by studying a hypothetical AMoD system in Dallas-Fort Worth and its impact on the Texas power network. Lack of coordination between the AMoD system and the power network can cause a 4.4% increase in the price of electricity in Dallas-Fort Worth; conversely, coordination between the AMoD system and the power network could reduce electricity expenditure compared to the case where no cars are present (despite the increased demand for electricity) and yield savings of up $147M/year. Finally, we provide a receding-horizon implementation and assess its performance with agent-based simulations. Collectively, the results of this paper provide a first-of-a-kind characterization of the interaction between electric-powered AMoD systems and the power network, and shed additional light on the economic and societal value of AMoD.

SYSep 14, 2016
Optimal Pricing to Manage Electric Vehicles in Coupled Power and Transportation Networks

Mahnoosh Alizadeh, Hoi-To Wai, Mainak Chowdhury et al.

We study the system-level effects of the introduction of large populations of Electric Vehicles on the power and transportation networks. We assume that each EV owner solves a decision problem to pick a cost-minimizing charge and travel plan. This individual decision takes into account traffic congestion in the transportation network, affecting travel times, as well as as congestion in the power grid, resulting in spatial variations in electricity prices for battery charging. We show that this decision problem is equivalent to finding the shortest path on an "extended" transportation graph, with virtual arcs that represent charging options. Using this extended graph, we study the collective effects of a large number of EV owners individually solving this path planning problem. We propose a scheme in which independent power and transportation system operators can collaborate to manage each network towards a socially optimum operating point while keeping the operational data of each system private. We further study the optimal reserve capacity requirements for pricing in the absence of such collaboration. We showcase numerically that a lack of attention to interdependencies between the two infrastructures can have adverse operational effects.