Oron Sabag

OC
h-index17
6papers
35citations
Novelty58%
AI Score34

6 Papers

OCJun 3, 2022
Optimal Competitive-Ratio Control

Oron Sabag, Sahin Lale, Babak Hassibi

Inspired by competitive policy designs approaches in online learning, new control paradigms such as competitive-ratio and regret-optimal control have been recently proposed as alternatives to the classical $\mathcal{H}_2$ and $\mathcal{H}_\infty$ approaches. These competitive metrics compare the control cost of the designed controller against the cost of a clairvoyant controller, which has access to past, present, and future disturbances in terms of ratio and difference, respectively. While prior work provided the optimal solution for the regret-optimal control problem, in competitive-ratio control, the solution is only provided for the sub-optimal problem. In this work, we derive the optimal solution to the competitive-ratio control problem. We show that the optimal competitive ratio formula can be computed as the maximal eigenvalue of a simple matrix, and provide a state-space controller that achieves the optimal competitive ratio. We conduct an extensive numerical study to verify this analytical solution, and demonstrate that the optimal competitive-ratio controller outperforms other controllers on several large scale practical systems. The key techniques that underpin our explicit solution is a reduction of the control problem to a Nehari problem, along with a novel factorization of the clairvoyant controller's cost. We reveal an interesting relation between the explicit solutions that now exist for both competitive control paradigms by formulating a regret-optimal control framework with weight functions that can also be utilized for practical purposes.

LGJun 19, 2025
Optimal Online Bookmaking for Any Number of Outcomes

Hadar Tal, Oron Sabag

We study the Online Bookmaking problem, where a bookmaker dynamically updates betting odds on the possible outcomes of an event. In each betting round, the bookmaker can adjust the odds based on the cumulative betting behavior of gamblers, aiming to maximize profit while mitigating potential loss. We show that for any event and any number of betting rounds, in a worst-case setting over all possible gamblers and outcome realizations, the bookmaker's optimal loss is the largest root of a simple polynomial. Our solution shows that bookmakers can be as fair as desired while avoiding financial risk, and the explicit characterization reveals an intriguing relation between the bookmaker's regret and Hermite polynomials. We develop an efficient algorithm that computes the optimal bookmaking strategy: when facing an optimal gambler, the algorithm achieves the optimal loss, and in rounds where the gambler is suboptimal, it reduces the achieved loss to the optimal opportunistic loss, a notion that is related to subgame perfect Nash equilibrium. The key technical contribution to achieve these results is an explicit characterization of the Bellman-Pareto frontier, which unifies the dynamic programming updates for Bellman's value function with the multi-criteria optimization framework of the Pareto frontier in the context of vector repeated games.

GTJan 12, 2025
Optimal Online Bookmaking for Binary Games

Alankrita Bhatt, Or Ordentlich, Oron Sabag

In online betting, the bookmaker can update the payoffs it offers on a particular event many times before the event takes place, and the updated payoffs may depend on the bets accumulated thus far. We study the problem of bookmaking with the goal of maximizing the return in the worst-case, with respect to the gamblers' behavior and the event's outcome. We formalize this problem as the \emph{Optimal Online Bookmaking game}, and provide the exact solution for the binary case. To this end, we develop the optimal bookmaking strategy, which relies on a new technique called bi-balancing trees, that assures that the house loss is the same for all \emph{decisive} betting sequences, where the gambler bets all its money on a single outcome in each round.

OCMay 4, 2021
Regret-Optimal LQR Control

Oron Sabag, Gautam Goel, Sahin Lale et al.

We consider the infinite-horizon LQR control problem. Motivated by competitive analysis in online learning, as a criterion for controller design we introduce the dynamic regret, defined as the difference between the LQR cost of a causal controller (that has only access to past disturbances) and the LQR cost of the \emph{unique} clairvoyant one (that has also access to future disturbances) that is known to dominate all other controllers. The regret itself is a function of the disturbances, and we propose to find a causal controller that minimizes the worst-case regret over all bounded energy disturbances. The resulting controller has the interpretation of guaranteeing the smallest regret compared to the best non-causal controller that can see the future. We derive explicit formulas for the optimal regret and for the regret-optimal controller for the state-space setting. These explicit solutions are obtained by showing that the regret-optimal control problem can be reduced to a Nehari extension problem that can be solved explicitly. The regret-optimal controller is shown to be linear and can be expressed as the sum of the classical $H_2$ state-feedback law and an $n$-th order controller ($n$ is the state dimension), and its construction simply requires a solution to the standard LQR Riccati equation and two Lyapunov equations. Simulations over a range of plants demonstrate that the regret-optimal controller interpolates nicely between the $H_2$ and the $H_\infty$ optimal controllers, and generally has $H_2$ and $H_\infty$ costs that are simultaneously close to their optimal values. The regret-optimal controller thus presents itself as a viable option for control systems design.

OCJan 25, 2021
Regret-Optimal Filtering for Prediction and Estimation

Oron Sabag, Babak Hassibi

The filtering problem of causally estimating a desired signal from a related observation signal is investigated through the lens of regret optimization. Classical filter designs, such as $\mathcal H_2$ (Kalman) and $\mathcal H_\infty$, minimize the average and worst-case estimation errors, respectively. As a result $\mathcal H_2$ filters are sensitive to inaccuracies in the underlying statistical model, and $\mathcal H_\infty$ filters are overly conservative since they safeguard against the worst-case scenario. We propose instead to minimize the \emph{regret} in order to design filters that perform well in different noise regimes by comparing their performance with that of a clairvoyant filter. More explicitly, we minimize the largest deviation of the squared estimation error of a causal filter from that of a non-causal filter that has access to future observations. In this sense, the regret-optimal filter will have the best competitive performance with respect to the non-causal benchmark filter no matter what the true signal and the observation process are. For the important case of signals that can be described with a time-invariant state-space, we provide an explicit construction for the regret optimal filter in the estimation (causal) and the prediction (strictly-causal) regimes. These solutions are obtained by reducing the regret filtering problem to a Nehari problem, i.e., approximating a non-causal operator by a causal one in spectral norm. The regret-optimal filters bear some resemblance to Kalman and $H_\infty$ filters: they are expressed as state-space models, inherit the finite dimension of the original state-space, and their solutions require solving algebraic Riccati equations. Numerical simulations demonstrate that regret minimization inherently interpolates between the performances of the $H_2$ and $H_\infty$ filters and is thus a viable approach for filter design.

ITJan 27, 2020
Computing the Feedback Capacity of Finite State Channels using Reinforcement Learning

Ziv Aharoni, Oron Sabag, Haim Henry Permuter

In this paper, we propose a novel method to compute the feedback capacity of channels with memory using reinforcement learning (RL). In RL, one seeks to maximize cumulative rewards collected in a sequential decision-making environment. This is done by collecting samples of the underlying environment and using them to learn the optimal decision rule. The main advantage of this approach is its computational efficiency, even in high dimensional problems. Hence, RL can be used to estimate numerically the feedback capacity of unifilar finite state channels (FSCs) with large alphabet size. The outcome of the RL algorithm sheds light on the properties of the optimal decision rule, which in our case, is the optimal input distribution of the channel. These insights can be converted into analytic, single-letter capacity expressions by solving corresponding lower and upper bounds. We demonstrate the efficiency of this method by analytically solving the feedback capacity of the well-known Ising channel with a ternary alphabet. We also provide a simple coding scheme that achieves the feedback capacity.