Sumit J. Darak

LG
7papers
33citations
Novelty48%
AI Score23

7 Papers

ITFeb 16, 2022
Exploiting Side Information for Improved Online Learning Algorithms in Wireless Networks

Manjesh K. Hanawal, Sumit J. Darak

In wireless networks, the rate achieved depends on factors like level of interference, hardware impairments, and channel gain. Often, instantaneous values of some of these factors can be measured, and they provide useful information about the instantaneous rate achieved. For example, higher interference implies a lower rate. In this work, we treat any such measurable quality that has a non-zero correlation with the rate achieved as side-information and study how it can be exploited to quickly learn the channel that offers higher throughput (reward). When the mean value of the side-information is known, using control variate theory we develop algorithms that require fewer samples to learn the parameters and can improve the learning rate compared to cases where side-information is ignored. Specifically, we incorporate side-information in the classical Upper Confidence Bound (UCB) algorithm and quantify the gain achieved in the regret performance. We show that the gain is proportional to the amount of the correlation between the reward and associated side-information. We discuss in detail various side-information that can be exploited in cognitive radio and air-to-ground communication in $L-$band. We demonstrate that correlation between the reward and side-information is often strong in practice and exploiting it improves the throughput significantly.

SYJun 5, 2021
Multi-armed Bandit Algorithms on System-on-Chip: Go Frequentist or Bayesian?

S. V. Sai Santosh, Sumit J. Darak

Multi-armed Bandit (MAB) algorithms identify the best arm among multiple arms via exploration-exploitation trade-off without prior knowledge of arm statistics. Their usefulness in wireless radio, IoT, and robotics demand deployment on edge devices, and hence, a mapping on system-on-chip (SoC) is desired. Theoretically, the Bayesian approach-based Thompson Sampling (TS) algorithm offers better performance than the frequentist approach-based Upper Confidence Bound (UCB) algorithm. However, TS is not synthesizable due to Beta function. We address this problem by approximating it via a pseudo-random number generator-based approach and efficiently realize the TS algorithm on Zynq SoC. In practice, the type of arms distribution (e.g., Bernoulli, Gaussian, etc.) is unknown and hence, a single algorithm may not be optimal. We propose a reconfigurable and intelligent MAB (RI-MAB) framework. Here, intelligence enables the identification of appropriate MAB algorithms for a given environment, and reconfigurability allows on-the-fly switching between algorithms on the SoC. This eliminates the need for parallel implementation of algorithms resulting in huge savings in resources and power consumption. We analyze the functional correctness, area, power, and execution time of the proposed and existing architectures for various arm distributions, word-length, and hardware-software co-design approaches. We demonstrate the superiority of the RI-MAB over TS and UCB only architectures.

NIMar 6, 2020
Distributed Learning in Ad-Hoc Networks: A Multi-player Multi-armed Bandit Framework

Sumit J. Darak, Manjesh K. Hanawal

Next-generation networks are expected to be ultra-dense with a very high peak rate but relatively lower expected traffic per user. For such scenario, existing central controller based resource allocation may incur substantial signaling (control communications) leading to a negative effect on the quality of service (e.g. drop calls), energy and spectrum efficiency. To overcome this problem, cognitive ad-hoc networks (CAHN) that share spectrum with other networks are being envisioned. They allow some users to identify and communicate in `free slots' thereby reducing signaling load and allowing the higher number of users per base stations (dense networks). Such networks open up many interesting challenges such as resource identification, coordination, dynamic and context-aware adaptation for which Machine Learning and Artificial Intelligence framework offers novel solutions. In this paper, we discuss state-of-the-art multi-armed multi-player bandit based distributed learning algorithms that allow users to adapt to the environment and coordinate with other players/users. We also discuss various open research problems for feasible realization of CAHN and interesting applications in other domains such as energy harvesting, Internet of Things, and Smart grids.

SPFeb 18, 2020
Intelligent and Reconfigurable Architecture for KL Divergence Based Online Machine Learning Algorithm

S. V. Sai Santosh, Sumit J. Darak

Online machine learning (OML) algorithms do not need any training phase and can be deployed directly in an unknown environment. OML includes multi-armed bandit (MAB) algorithms that can identify the best arm among several arms by achieving a balance between exploration of all arms and exploitation of optimal arm. The Kullback-Leibler divergence based upper confidence bound (KLUCB) is the state-of-the-art MAB algorithm that optimizes exploration-exploitation trade-off but it is complex due to underlining optimization routine. This limits its usefulness for robotics and radio applications which demand integration of KLUCB with the PHY on the system on chip (SoC). In this paper, we efficiently map the KLUCB algorithm on SoC by realizing optimization routine via alternative synthesizable computation without compromising on the performance. The proposed architecture is dynamically reconfigurable such that the number of arms, as well as type of algorithm, can be changed on-the-fly. Specifically, after initial learning, on-the-fly switch to light-weight UCB offers around 10-factor improvement in latency and throughput. Since learning duration depends on the unknown arm statistics, we offer intelligence embedded in architecture to decide the switching instant. We validate the functional correctness and usefulness of the proposed architecture via a realistic wireless application and detailed complexity analysis demonstrates its feasibility in realizing intelligent radios.

SPDec 11, 2019
Novel Deep Learning Framework for Wideband Spectrum Characterization at Sub-Nyquist Rate

Shivam Chandhok, Himani Joshi, A V Subramanyam et al.

Introduction of spectrum-sharing in 5G and subsequent generation networks demand base-station(s) with the capability to characterize the wideband spectrum spanned over licensed, shared and unlicensed non-contiguous frequency bands. Spectrum characterization involves the identification of vacant bands along with center frequency and parameters (energy, modulation, etc.) of occupied bands. Such characterization at Nyquist sampling is area and power-hungry due to the need for high-speed digitization. Though sub-Nyquist sampling (SNS) offers an excellent alternative when the spectrum is sparse, it suffers from poor performance at low signal to noise ratio (SNR) and demands careful design and integration of digital reconstruction, tunable channelizer and characterization algorithms. In this paper, we propose a novel deep-learning framework via a single unified pipeline to accomplish two tasks: 1)~Reconstruct the signal directly from sub-Nyquist samples, and 2)~Wideband spectrum characterization. The proposed approach eliminates the need for complex signal conditioning between reconstruction and characterization and does not need complex tunable channelizers. We extensively compare the performance of our framework for a wide range of modulation schemes, SNR and channel conditions. We show that the proposed framework outperforms existing SNS based approaches and characterization performance approaches to Nyquist sampling-based framework with an increase in SNR. Easy to design and integrate along with a single unified deep learning framework make the proposed architecture a good candidate for reconfigurable platforms.

LGJan 12, 2019
Multiplayer Multi-armed Bandits for Optimal Assignment in Heterogeneous Networks

Harshvardhan Tibrewal, Sravan Patchala, Manjesh K. Hanawal et al.

We consider an ad hoc network where multiple users access the same set of channels. The channel characteristics are unknown and could be different for each user (heterogeneous). No controller is available to coordinate channel selections by the users, and if multiple users select the same channel, they collide and none of them receive any rate (or reward). For such a completely decentralized network we develop algorithms that aim to achieve optimal network throughput. Due to lack of any direct communication between the users, we allow each user to exchange information by transmitting in a specific pattern and sense such transmissions from others. However, such transmissions and sensing for information exchange do not add to network throughput. For the wideband sensing and narrowband sensing scenarios, we first develop explore-and-commit algorithms that converge to near-optimal allocation with high probability in a small number of rounds. Building on this, we develop an algorithm that gives logarithmic regret, even when the number of users changes with time. We validate our claims through extensive experiments and show that our algorithms perform significantly better than the state-of-the-art CSM-MAB, dE3 and dE3-TS algorithms.

LGSep 17, 2018
Multi-Player Bandits: A Trekking Approach

Manjesh K. Hanawal, Sumit J. Darak

We study stochastic multi-armed bandits with many players. The players do not know the number of players, cannot communicate with each other and if multiple players select a common arm they collide and none of them receive any reward. We consider the static scenario, where the number of players remains fixed, and the dynamic scenario, where the players enter and leave at any time. We provide algorithms based on a novel `trekking approach' that guarantees constant regret for the static case and sub-linear regret for the dynamic case with high probability. The trekking approach eliminates the need to estimate the number of players resulting in fewer collisions and improved regret performance compared to the state-of-the-art algorithms. We also develop an epoch-less algorithm that eliminates any requirement of time synchronization across the players provided each player can detect the presence of other players on an arm. We validate our theoretical guarantees using simulation based and real test-bed based experiments.