Subhonmesh Bose

ML
h-index58
8papers
3citations
Novelty61%
AI Score39

8 Papers

OCAug 15, 2014
Optimal Placement of Distributed Energy Storage in Power Networks

Christos Thrampoulidis, Subhonmesh Bose, Babak Hassibi

We formulate the optimal placement, sizing and control of storage devices in a power network to minimize generation costs with the intent of load shifting. We assume deterministic demand, a linearized DC approximated power flow model and a fixed available storage budget. Our main result proves that when the generation costs are convex and nondecreasing, there always exists an optimal storage capacity allocation that places zero storage at generation-only buses that connect to the rest of the network via single links. This holds regardless of the demand profiles, generation capacities, line-flow limits and characteristics of the storage technologies. Through a counterexample, we illustrate that this result is not generally true for generation buses with multiple connections. For specific network topologies, we also characterize the dependence of the optimal generation cost on the available storage budget, generation capacities and flow constraints.

OCApr 2, 2017
Cash-settled options for wholesale electricity markets

Khaled Alshehri, Subhonmesh Bose, Tamer Başar

Wholesale electricity market designs in practice do not provide the market participants with adequate mechanisms to hedge their financial risks. Demanders and suppliers will likely face even greater risks with the deepening penetration of variable renewable resources like wind and solar. This paper explores the design of a centralized cash-settled call option market to mitigate such risks. A cash-settled call option is a financial instrument that allows its holder the right to claim a monetary reward equal to the positive difference between the real-time price of an underlying commodity and a pre-negotiated strike price for an upfront fee. Through an example, we illustrate that a bilateral call option can reduce the payment volatility of market participants. Then, we design a centralized clearing mechanism for call options that generalizes the bilateral trade. We illustrate through an example how the centralized clearing mechanism generalizes the bilateral trade. Finally, the effect of risk preference of the market participants, as well as some generalizations are discussed.

ITMar 30
Learning Where to Look: UCB-Driven Controlled Sensing for Quickest Change Detection

Yu-Han Huang, Argyrios Gerogiannis, Subhonmesh Bose et al.

We study the multichannel quickest change detection problem with bandit feedback and controlled sensing, in which an agent sequentially selects one of the data streams to observe at each time-step and aims to detect an unknown change as quickly as possible while controlling false alarms. Assuming known pre- and post-change distributions and allowing an arbitrary subset of streams to be affected by the change, we propose two novel and computationally efficient detection procedures inspired by the Upper Confidence Bound (UCB) multi-armed bandit algorithm. Our methods adaptively concentrate sensing on the most informative streams while preserving false-alarm guarantees. We show that both procedures achieve first-order asymptotic optimality in detection delay under standard false-alarm constraints. We also extend the UCB-driven controlled sensing approach to the setting where the pre- and post-change distributions are unknown, except for a mean-shift in at least one of the channels at the change-point. This setting is particularly relevant to the problem of learning in piecewise stationary environments. Finally, extensive simulations on synthetic benchmarks show that our methods consistently outperform existing state-of-the-art approaches while offering substantial computational savings.

AIJan 2, 2025
Detection Augmented Bandit Procedures for Piecewise Stationary MABs: A Modular Approach

Yu-Han Huang, Argyrios Gerogiannis, Subhonmesh Bose et al.

Conventional Multi-Armed Bandit (MAB) algorithms are designed for stationary environments, where the reward distributions associated with the arms do not change with time. In many applications, however, the environment is more accurately modeled as being non-stationary. In this work, piecewise stationary MAB (PS-MAB) environments are investigated, in which the reward distributions associated with a subset of the arms change at some change-points and remain stationary between change-points. Our focus is on the asymptotic analysis of PS-MABs, for which practical algorithms based on change detection have been previously proposed. Our goal is to modularize the design and analysis of such Detection Augmented Bandit (DAB) procedures. To this end, we first provide novel, improved performance lower bounds for PS-MABs. Then, we identify the requirements for stationary bandit algorithms and change detectors in a DAB procedure that are needed for the modularization. We assume that the rewards are sub-Gaussian. Under this assumption and a condition on the separation of the change-points, we show that the analysis of DAB procedures can indeed be modularized, so that the regret bounds can be obtained in a unified manner for various combinations of change detectors and bandit algorithms. Through this analysis, we develop new modular DAB procedures that are order-optimal. Finally, we showcase the practical effectiveness of our modular DAB approach in our experiments, studying its regret performance compared to other methods and investigating its detection capabilities.

LGJan 31, 2025
DAL: A Practical Prior-Free Black-Box Framework for Non-Stationary Bandits

Argyrios Gerogiannis, Yu-Han Huang, Subhonmesh Bose et al.

We introduce a practical, black-box framework termed Detection Augmented Learning (DAL) for the problem of non-stationary bandits without prior knowledge of the underlying non-stationarity. DAL accepts any stationary bandit algorithm as input and augments it with a change detector, enabling applicability to all common bandit variants. Extensive experimentation demonstrates that DAL consistently surpasses current state-of-the-art methods across diverse non-stationary scenarios, including synthetic benchmarks and real-world datasets, underscoring its versatility and scalability. We provide theoretical insights into DAL's strong empirical performance, complemented by thorough experimental validation.

MLJan 27, 2025
Nonparametric Sparse Online Learning of the Koopman Operator

Boya Hou, Sina Sanjari, Nathan Dahlin et al.

The Koopman operator provides a powerful framework for representing the dynamics of general nonlinear dynamical systems. Data-driven techniques to learn the Koopman operator typically assume that the chosen function space is closed under system dynamics. In this paper, we study the Koopman operator via its action on the reproducing kernel Hilbert space (RKHS), and explore the mis-specified scenario where the dynamics may escape the chosen function space. We relate the Koopman operator to the conditional mean embeddings (CME) operator and then present an operator stochastic approximation algorithm to learn the Koopman operator iteratively with control over the complexity of the representation. We provide both asymptotic and finite-time last-iterate guarantees of the online sparse learning algorithm with trajectory-based sampling with an analysis that is substantially more involved than that for finite-dimensional stochastic approximation. Numerical examples confirm the effectiveness of the proposed algorithm.

MLNov 19, 2024
Tangential Randomization in Linear Bandits (TRAiL): Guaranteed Inference and Regret Bounds

Arda Güçlü, Subhonmesh Bose

We propose and analyze TRAiL (Tangential Randomization in Linear Bandits), a computationally efficient regret-optimal forced exploration algorithm for linear bandits on action sets that are sublevel sets of strongly convex functions. TRAiL estimates the governing parameter of the linear bandit problem through a standard regularized least squares and perturbs the reward-maximizing action corresponding to said point estimate along the tangent plane of the convex compact action set before projecting back to it. Exploiting concentration results for matrix martingales, we prove that TRAiL ensures a $Ω(\sqrt{T})$ growth in the inference quality, measured via the minimum eigenvalue of the design (regressor) matrix with high-probability over a $T$-length period. We build on this result to obtain an $\mathcal{O}(\sqrt{T} \log(T))$ upper bound on cumulative regret with probability at least $ 1 - 1/T$ over $T$ periods, and compare TRAiL to other popular algorithms for linear bandits. Then, we characterize an $Ω(\sqrt{T})$ minimax lower bound for any algorithm on the expected regret that covers a wide variety of action/parameter sets and noise processes. Our analysis not only expands the realm of lower-bounds in linear bandits significantly, but as a byproduct, yields a trade-off between regret and inference quality. Specifically, we prove that any algorithm with an $\mathcal{O}(T^α)$ expected regret growth must have an $Ω(T^{1-α})$ asymptotic growth in expected inference quality. Our experiments on the $L^p$ unit ball as action sets reveal how this relation can be violated, but only in the short-run, before returning to respect the bound asymptotically. In effect, regret-minimizing algorithms must have just the right rate of inference -- too fast or too slow inference will incur sub-optimal regret growth.

MLMay 13, 2024
Nonparametric Sparse Online Learning of the Koopman Operator

Boya Hou, Sina Sanjari, Nathan Dahlin et al.

The Koopman operator provides a powerful framework for representing the dynamics of general nonlinear dynamical systems. However, existing data-driven approaches to learning the Koopman operator rely on batch data. In this work, we present a sparse online learning algorithm that learns the Koopman operator iteratively via stochastic approximation, with explicit control over model complexity and provable convergence guarantees. Specifically, we study the Koopman operator via its action on the reproducing kernel Hilbert space (RKHS), and address the mis-specified scenario where the dynamics may escape the chosen RKHS. In this mis-specified setting, we relate the Koopman operator to the conditional mean embeddings (CME) operator. We further establish both asymptotic and finite-time convergence guarantees for our learning algorithm in mis-specified setting, with trajectory-based sampling where the data arrive sequentially over time. Numerical experiments demonstrate the algorithm's capability to learn unknown nonlinear dynamics.