Spencer Hutchinson

LG
h-index32
8papers
18citations
Novelty49%
AI Score44

8 Papers

LGAug 29, 2023
Directional Optimism for Safe Linear Bandits

Spencer Hutchinson, Berkay Turan, Mahnoosh Alizadeh

The safe linear bandit problem is a version of the classical stochastic linear bandit problem where the learner's actions must satisfy an uncertain constraint at all rounds. Due its applicability to many real-world settings, this problem has received considerable attention in recent years. By leveraging a novel approach that we call directional optimism, we find that it is possible to achieve improved regret guarantees for both well-separated problem instances and action sets that are finite star convex sets. Furthermore, we propose a novel algorithm for this setting that improves on existing algorithms in terms of empirical performance, while enjoying matching regret guarantees. Lastly, we introduce a generalization of the safe linear bandit setting where the constraints are convex and adapt our algorithms and analyses to this setting by leveraging a novel convex-analysis based approach.

OCApr 20
Steady-state Based Approach to Online Non-stochastic Control

Vijeth Hebbar, Spencer Hutchinson, Mahnoosh Alizadeh et al.

We study the problem of online non-stochastic control (ONC), which is the control of a linear system under adversarial disturbances and adversarial cost functions, with the aim of minimizing the total cost incurred. A recent line of literature in ONC develops algorithms that enjoy sublinear regret with respect to a benchmark based on the set of steady-states that are attainable by a constant input. In this work, we extend this research direction by giving an algorithm that enjoys $\mathcal{O}(\sqrt{T})$ regret with respect to a richer benchmark set, namely the set of steady-states attainable under an \emph{affine controller}. Since this benchmark substantially broadens the comparison class, it provides significantly stronger performance guarantees. Our proposed algorithm combines a Follow-The-Perturbed-Leader-style online non-convex optimization approach with a batching method that maintains stability despite changing policies. Although our proposed algorithm requires solving non-convex subproblems, we show that an approximate solution to this subproblem is sufficient to ensure $\mathcal{O}(\sqrt{T})$ regret. Furthermore, numerical experiments show that our algorithm enjoys lower total cost and similar computation to existing methods in certain settings.

OCApr 24
Near-Optimal Regret for the Safe Learning-based Control of the Constrained Linear Quadratic Regulator

Spencer Hutchinson, Nanfei Jiang, Mahnoosh Alizadeh

We study the problem of adaptive control of the stochastic linear quadratic regulator (LQR) with constraints that must be satisfied at every time step. Prior work on the multidimensional problem has shown $\tilde{O}(T^{2/3})$ regret and satisfaction of robust constraints, leaving open the question of whether $\tilde{O}(\sqrt{T})$ regret can be attained in the constrained LQR setting. We contribute to this problem by showing $\tilde{O}(\sqrt{T})$ regret and satisfaction of chance constraints. This type of constraints allow us to handle unbounded noise and also enable analytical techniques not directly applicable to robust constraints. Our proposed algorithm for this problem uses an SDP to select an optimistic policy, and then "scales back" this policy until it is verifiably-safe. Our theoretical analysis establishes regret and constraint guarantees via a key lemma that bounds the system covariance in terms of the chosen policy. This covariance-based analysis is in contrast with the cost-to-go based analysis that is typically used in adaptive LQR.

LGJul 16, 2024
Safe Online Convex Optimization with Multi-Point Feedback

Spencer Hutchinson, Mahnoosh Alizadeh

Motivated by the stringent safety requirements that are often present in real-world applications, we study a safe online convex optimization setting where the player needs to simultaneously achieve sublinear regret and zero constraint violation while only using zero-order information. In particular, we consider a multi-point feedback setting, where the player chooses $d + 1$ points in each round (where $d$ is the problem dimension) and then receives the value of the constraint function and cost function at each of these points. To address this problem, we propose an algorithm that leverages forward-difference gradient estimation as well as optimistic and pessimistic action sets to achieve $\mathcal{O}(d \sqrt{T})$ regret and zero constraint violation under the assumption that the constraint function is smooth and strongly convex. We then perform a numerical study to investigate the impacts of the unknown constraint and zero-order feedback on empirical performance.

LGMar 9, 2024
Optimistic Safety for Online Convex Optimization with Unknown Linear Constraints

Spencer Hutchinson, Tianyi Chen, Mahnoosh Alizadeh

We study the problem of online convex optimization (OCO) under unknown linear constraints that are either static, or stochastically time-varying. For this problem, we introduce an algorithm that we term Optimistically Safe OCO (OSOCO) and show that it enjoys $\tilde{O}(\sqrt{T})$ regret and no constraint violation. In the case of static linear constraints, this improves on the previous best known $\tilde{O}(T^{2/3})$ regret under the same assumptions. In the case of stochastic time-varying constraints, our work supplements existing results that show $O(\sqrt{T})$ regret and $O(\sqrt{T})$ cumulative violation under more general convex constraints and a different set of assumptions. In addition to our theoretical guarantees, we also give numerical results that further validate the effectiveness of our approach.

OCApr 23, 2025
The Safety-Privacy Tradeoff in Linear Bandits

Arghavan Zibaie, Spencer Hutchinson, Ramtin Pedarsani et al.

We consider a collection of linear stochastic bandit problems, each modeling the random response of different agents to proposed interventions, coupled together by a global safety constraint. We assume a central coordinator must choose actions to play on each bandit with the objective of regret minimization, while also ensuring that the expected response of all agents satisfies the global safety constraints at each round, in spite of uncertainty about the bandits' parameters. The agents consider their observed responses to be private and in order to protect their sensitive information, the data sharing with the central coordinator is performed under local differential privacy (LDP). However, providing higher level of privacy to different agents would have consequences in terms of safety and regret. We formalize these tradeoffs by building on the notion of the sharpness of the safety set - a measure of how the geometric properties of the safe set affects the growth of regret - and propose a unilaterally unimprovable vector of privacy levels for different agents given a maximum regret budget.

LGFeb 18, 2025
Constrained Online Convex Optimization with Polyak Feasibility Steps

Spencer Hutchinson, Mahnoosh Alizadeh

In this work, we study online convex optimization with a fixed constraint function $g : \mathbb{R}^d \rightarrow \mathbb{R}$. Prior work on this problem has shown $O(\sqrt{T})$ regret and cumulative constraint satisfaction $\sum_{t=1}^{T} g(x_t) \leq 0$, while only accessing the constraint value and subgradient at the played actions $g(x_t), \partial g(x_t)$. Using the same constraint information, we show a stronger guarantee of anytime constraint satisfaction $g(x_t) \leq 0 \ \forall t \in [T]$, and matching $O(\sqrt{T})$ regret guarantees. These contributions are thanks to our approach of using Polyak feasibility steps to ensure constraint satisfaction, without sacrificing regret. Specifically, after each step of online gradient descent, our algorithm applies a subgradient descent step on the constraint function where the step-size is chosen according to the celebrated Polyak step-size. We further validate this approach with numerical experiments.

LGMay 1, 2023
The Impact of the Geometric Properties of the Constraint Set in Safe Optimization with Bandit Feedback

Spencer Hutchinson, Berkay Turan, Mahnoosh Alizadeh

We consider a safe optimization problem with bandit feedback in which an agent sequentially chooses actions and observes responses from the environment, with the goal of maximizing an arbitrary function of the response while respecting stage-wise constraints. We propose an algorithm for this problem, and study how the geometric properties of the constraint set impact the regret of the algorithm. In order to do so, we introduce the notion of the sharpness of a particular constraint set, which characterizes the difficulty of performing learning within the constraint set in an uncertain setting. This concept of sharpness allows us to identify the class of constraint sets for which the proposed algorithm is guaranteed to enjoy sublinear regret. Simulation results for this algorithm support the sublinear regret bound and provide empirical evidence that the sharpness of the constraint set impacts the performance of the algorithm.