SYMay 27
Information Age-Controllability Trade-offs in Communication-Constrained NetworksSongita Das, Gourab Ghatak, Chen Quan et al.
We investigate the trade-off between controllability, channel access, and age-related performance in a wireless network of control systems. Controllers share a random-access channel to transmit control inputs to actuators over slotted blocks. We measure reliable control via block controllability, where a block is controllable if it contains a required number of consecutive successful transmissions. In parallel, we capture information freshness via the age of information. To enable efficient allocation of channel resources over time, we introduce adaptive access probabilities at the block level, prioritizing controllers that have not yet achieved controllability. We then derive closed-form expressions for block controllability probability, the peak latency between inter-block consecutive successes, and peak age of information. We further characterize the peak control latency, defined as the time between consecutive controllable blocks. Finally, we optimize access probabilities to jointly balance controllability and age-related metrics. Numerical results illustrate the effectiveness of the proposed adaptive access policies in managing this trade-off in interference-limited wireless control networks.
LGMay 20, 2022
Fast Change Identification in Multi-Play Bandits and its Applications in Wireless NetworksGourab Ghatak
Next-generation wireless services are characterized by a diverse set of requirements, to sustain which, the wireless access points need to probe the users in the network periodically. In this regard, we study a novel multi-armed bandit (MAB) setting that mandates probing all the arms periodically while keeping track of the best current arm in a non-stationary environment. In particular, we develop \texttt{TS-GE} that balances the regret guarantees of classical Thompson sampling (TS) with the broadcast probing (BP) of all the arms simultaneously in order to actively detect a change in the reward distributions. The main innovation in the algorithm is in identifying the changed arm by an optional subroutine called group exploration (GE) that scales as $\log_2(K)$ for a $K-$armed bandit setting. We characterize the probability of missed detection and the probability of false-alarm in terms of the environment parameters. We highlight the conditions in which the regret guarantee of \texttt{TS-GE} outperforms that of the state-of-the-art algorithms, in particular, \texttt{ADSWITCH} and \texttt{M-UCB}. We demonstrate the efficacy of \texttt{TS-GE} by employing it in two wireless system application - task offloading in mobile-edge computing (MEC) and an industrial internet-of-things (IIoT) network designed for simultaneous wireless information and power transfer (SWIPT).
AIDec 18, 2025
Weighted K-Harmonic Means Clustering: Convergence Analysis and Applications to Wireless CommunicationsGourab Ghatak
We propose the \emph{weighted K-harmonic means} (WKHM) clustering algorithm, a regularized variant of K-harmonic means designed to ensure numerical stability while enabling soft assignments through inverse-distance weighting. Unlike classical K-means and constrained K-means, WKHM admits a direct interpretation in wireless networks: its weights are exactly equivalent to fractional user association based on received signal strength. We establish rigorous convergence guarantees under both deterministic and stochastic settings, addressing key technical challenges arising from non-convexity and random initialization. Specifically, we prove monotone descent to a local minimum under fixed initialization, convergence in probability under Binomial Point Process (BPP) initialization, and almost sure convergence under mild decay conditions. These results provide the first stochastic convergence guarantees for harmonic-mean-based clustering. Finally, through extensive simulations with diverse user distributions, we show that WKHM achieves a superior tradeoff between minimum signal strength and load fairness compared to classical and modern clustering baselines, making it a principled tool for joint radio node placement and user association in wireless networks.
LGAug 19, 2025
Order Optimal Regret Bounds for Sharpe Ratio Optimization in the Bandit SettingMohammad Taha Shah, Sabrina Khurshid, Gourab Ghatak
In this paper, we investigate the problem of sequential decision-making for Sharpe ratio (SR) maximization in a stochastic bandit setting. We focus on the Thompson Sampling (TS) algorithm, a Bayesian approach celebrated for its empirical performance and exploration efficiency, under the assumption of Gaussian rewards with unknown parameters. Unlike conventional bandit objectives focusing on maximizing cumulative reward, Sharpe ratio optimization instead introduces an inherent tradeoff between achieving high returns and controlling risk, demanding careful exploration of both mean and variance. Our theoretical contributions include a novel regret decomposition specifically designed for the Sharpe ratio, highlighting the role of information acquisition about the reward distribution in driving learning efficiency. Then, we establish fundamental performance limits for the proposed algorithm \texttt{SRTS} in terms of an upper bound on regret. We also derive the matching lower bound and show the order-optimality. Our results show that Thompson Sampling achieves logarithmic regret over time, with distribution-dependent factors capturing the difficulty of distinguishing arms based on risk-adjusted performance. Empirical simulations show that our algorithm significantly outperforms existing algorithms.
LGMay 17, 2025
Variance-Optimal Arm Selection: Regret Minimization and Best Arm IdentificationSabrina Khurshid, Gourab Ghatak, Mohammad Shahid Abdulla
This paper focuses on selecting the arm with the highest variance from a set of $K$ independent arms. Specifically, we focus on two settings: (i) regret setting, that penalizes the number of pulls of suboptimal arms in terms of variance, and (ii) fixed-budget BAI setting, that evaluates the ability of an algorithm to determine the arm with the highest variance after a fixed number of pulls. We develop a novel online algorithm called \texttt{UCB-VV} for the regret setting and show that its upper bound on regret for bounded rewards evolves as $\mathcal{O}\left(\log{n}\right)$ where $n$ is the horizon. By deriving the lower bound on the regret, we show that \texttt{UCB-VV} is order optimal. For the fixed budget BAI setting, we propose the \texttt{SHVV} algorithm. We show that the upper bound of the error probability of \texttt{SHVV} evolves as $\exp\left(-\frac{n}{\log(K) H}\right)$, where $H$ represents the complexity of the problem, and this rate matches the corresponding lower bound. We extend the framework from bounded distributions to sub-Gaussian distributions using a novel concentration inequality on the sample variance. Leveraging the same, we derive a concentration inequality for the empirical Sharpe ratio (SR) for sub-Gaussian distributions, which was previously unknown in the literature. Empirical simulations show that \texttt{UCB-VV} consistently outperforms \texttt{$ε$-greedy} across different sub-optimality gaps, though it is surpassed by \texttt{VTS}, which exhibits the lowest regret, albeit lacking in theoretical guarantees. We also illustrate the superior performance of \texttt{SHVV}, for a fixed budget setting under 6 different setups against uniform sampling. Finally, we conduct a case study to empirically evaluate the performance of the \texttt{UCB-VV} and \texttt{SHVV} in call option trading on $100$ stocks generated using geometric Brownian motion (GBM).
LGFeb 3, 2025
An Algorithm for Fixed Budget Best Arm Identification with Combinatorial ExplorationSiddhartha Parupudi, Gourab Ghatak
We consider the best arm identification (BAI) problem in the $K-$armed bandit framework with a modification - the agent is allowed to play a subset of arms at each time slot instead of one arm. Consequently, the agent observes the sample average of the rewards of the arms that constitute the probed subset. Several trade-offs arise here - e.g., sampling a larger number of arms together results in a wider view of the environment, while sampling fewer arms enhances the information about individual reward distributions. Furthermore, grouping a large number of suboptimal arms together albeit reduces the variance of the reward of the group, it may enhance the group mean to make it close to that containing the optimal arm. To solve this problem, we propose an algorithm that constructs $\log_2 K$ groups and performs a likelihood ratio test to detect the presence of the best arm in each of these groups. Then a Hamming decoding procedure determines the unique best arm. We derive an upper bound for the error probability of the proposed algorithm based on a new hardness parameter $H_4$. Finally, we demonstrate cases under which it outperforms the state-of-the-art algorithms for the single play case.
MLMay 30, 2021
Kolmogorov-Smirnov Test-Based Actively-Adaptive Thompson Sampling for Non-Stationary BanditsGourab Ghatak, Hardhik Mohanty, Aniq Ur Rahman
We consider the non-stationary multi-armed bandit (MAB) framework and propose a Kolmogorov-Smirnov (KS) test based Thompson Sampling (TS) algorithm named TS-KS, that actively detects change points and resets the TS parameters once a change is detected. In particular, for the two-armed bandit case, we derive bounds on the number of samples of the reward distribution to detect the change once it occurs. Consequently, we show that the proposed algorithm has sub-linear regret. Contrary to existing works, our algorithm is able to detect a change when the underlying reward distribution changes even though the mean reward remains the same. Finally, to test the efficacy of the proposed algorithm, we employ it in two case-studies: i) task-offloading scenario in wireless edge-computing, and ii) portfolio optimization. Our results show that the proposed TS-KS algorithm outperforms not only the static TS algorithm but also it performs better than other bandit algorithms designed for non-stationary environments. Moreover, the performance of TS-KS is at par with the state-of-the-art forecasting algorithms such as Facebook-PROPHET and ARIMA.
LGSep 6, 2020
A Change-Detection Based Thompson Sampling Framework for Non-Stationary BanditsGourab Ghatak
We consider a non-stationary two-armed bandit framework and propose a change-detection based Thompson sampling (TS) algorithm, named TS with change-detection (TS-CD), to keep track of the dynamic environment. The non-stationarity is modeled using a Poisson arrival process, which changes the mean of the rewards on each arrival. The proposed strategy compares the empirical mean of the recent rewards of an arm with the estimate of the mean of the rewards from its history. It detects a change when the empirical mean deviates from the mean estimate by a value larger than a threshold. Then, we characterize the lower bound on the duration of the time-window for which the bandit framework must remain stationary for TS-CD to successfully detect a change when it occurs. Consequently, our results highlight an upper bound on the parameter for the Poisson arrival process, for which the TS-CD achieves asymptotic regret optimality with high probability. Finally, we validate the efficacy of TS-CD by testing it for edge-control of radio access technique (RAT)-selection in a wireless network. Our results show that TS-CD not only outperforms the classical max-power RAT selection strategy but also other actively adaptive and passively adaptive bandit algorithms that are designed for non-stationary environments.
SPJun 22, 2020
An Online Algorithm for Computation Offloading in Non-Stationary EnvironmentsAniq Ur Rahman, Gourab Ghatak, Antonio De Domenico
We consider the latency minimization problem in a task-offloading scenario, where multiple servers are available to the user equipment for outsourcing computational tasks. To account for the temporally dynamic nature of the wireless links and the availability of the computing resources, we model the server selection as a multi-armed bandit (MAB) problem. In the considered MAB framework, rewards are characterized in terms of the end-to-end latency. We propose a novel online learning algorithm based on the principle of optimism in the face of uncertainty, which outperforms the state-of-the-art algorithms by up to ~1s. Our results highlight the significance of heavily discounting the past rewards in dynamic environments.