SYMar 31, 2019
Completely Uncoupled User Association Algorithms for State Dependent Wireless NetworksS. Ramakrishnan, Venkatesh Ramaiyan, K. P. Naveen
We study a distributed user association algorithm for a heterogeneous wireless network with the objective of maximizing the sum of the utilities (on the received throughput of wireless users). We consider a state dependent wireless network, where the rate achieved by the users are a function of their user associations as well as the state of the system. We consider three different scenarios depending on the state evolution and the users$\text{'}$ knowledge of the system state. In this context, we present completely uncoupled user association algorithms for utility maximization where the users$\text{'}$ association is entirely a function of its past associations and its received throughput. In particular, the user is oblivious to the association of the other users in the network. Using the theory of perturbed Markov chains, we show the optimality of our algorithms under appropriate scenarios.
LGNov 9, 2017
Efficient-UCBV: An Almost Optimal Algorithm using Variance EstimatesSubhojyoti Mukherjee, K. P. Naveen, Nandan Sudarsanam et al.
We propose a novel variant of the UCB algorithm (referred to as Efficient-UCB-Variance (EUCBV)) for minimizing cumulative regret in the stochastic multi-armed bandit (MAB) setting. EUCBV incorporates the arm elimination strategy proposed in UCB-Improved \citep{auer2010ucb}, while taking into account the variance estimates to compute the arms' confidence bounds, similar to UCBV \citep{audibert2009exploration}. Through a theoretical analysis we establish that EUCBV incurs a \emph{gap-dependent} regret bound of {\scriptsize $O\left( \dfrac{Kσ^2_{\max} \log (TΔ^2 /K)}Δ\right)$} after $T$ trials, where $Δ$ is the minimal gap between optimal and sub-optimal arms; the above bound is an improvement over that of existing state-of-the-art UCB algorithms (such as UCB1, UCB-Improved, UCBV, MOSS). Further, EUCBV incurs a \emph{gap-independent} regret bound of {\scriptsize $O\left(\sqrt{KT}\right)$} which is an improvement over that of UCB1, UCBV and UCB-Improved, while being comparable with that of MOSS and OCUCB. Through an extensive numerical study we show that EUCBV significantly outperforms the popular UCB variants (like MOSS, OCUCB, etc.) as well as Thompson sampling and Bayes-UCB algorithms.
LGApr 7, 2017
Thresholding Bandits with Augmented UCBSubhojyoti Mukherjee, K. P. Naveen, Nandan Sudarsanam et al.
In this paper we propose the Augmented-UCB (AugUCB) algorithm for a fixed-budget version of the thresholding bandit problem (TBP), where the objective is to identify a set of arms whose quality is above a threshold. A key feature of AugUCB is that it uses both mean and variance estimates to eliminate arms that have been sufficiently explored; to the best of our knowledge this is the first algorithm to employ such an approach for the considered TBP. Theoretically, we obtain an upper bound on the loss (probability of mis-classification) incurred by AugUCB. Although UCBEV in literature provides a better guarantee, it is important to emphasize that UCBEV has access to problem complexity (whose computation requires arms' mean and variances), and hence is not realistic in practice; this is in contrast to AugUCB whose implementation does not require any such complexity inputs. We conduct extensive simulation experiments to validate the performance of AugUCB. Through our simulation work, we establish that AugUCB, owing to its utilization of variance estimates, performs significantly better than the state-of-the-art APT, CSAR and other non variance-based algorithms.
NIJun 20, 2015
Competitive Selection of Ephemeral Relays in Wireless NetworksK. P. Naveen, Anurag Kumar, Eitan Altman
We consider a setting in which two nodes (referred to as forwarders) compete to choose a relay node from a set of relays, as they ephemerally become available (e.g., wake up from a sleep state). Each relay, when it arrives, offers a (possibly different) "reward" to each forwarder. Each forwarder's objective is to minimize a combination of the delay incurred in choosing a relay and the reward offered by the chosen relay. As an example, we develop the reward structure for the specific problem of geographical forwarding over a network of sleep-wake cycling relays. We study two variants of the generic relay selection problem, namely, the completely observable (CO) case where, when a relay arrives, both forwarders get to observe both rewards, and the partially observable (PO) case where each forwarder can only observe its own reward. Formulating the problem as a two person stochastic game, we characterize solution in terms of Nash Equilibrium Policy Pairs (NEPPs). For the CO case we provide a general structure of the NEPPs. For the PO case we prove that there exists an NEPP within the class of threshold policy pairs. We then consider the particular application of geographical forwarding of packets in a shared network of sleep-wake cycling wireless relays. For this problem, for a particular reward structure, using realistic parameter values corresponding to TelosB wireless mote, we numerically compare the performance (in terms of cost to both forwarders) of the various NEPPs and draw the following key insight: even for moderate separation between the two forwarders, the performance of the various NEPPs is close to the performance of a simple strategy where each forwarder behaves as if the other forwarder is not present. We also conduct simulation experiments to study the end-to-end performance of the simple forwarding policy.