Krishna Jagannathan

22papers

364citations

Novelty43%

AI Score41

Ranked #92,887 of 205,806 authors (top 45%)#20,552 in LG (top 49%)

22 Papers

SYMar 29, 2018

Stability, convergence and Hopf bifurcation analyses of the classical car-following model

Gopal Krishna Kamath, Krishna Jagannathan, Gaurav Raina

Reaction delays play an important role in determining the qualitative dynamical properties of a platoon of vehicles traversing a straight road. In this paper, we investigate the impact of delayed feedback on the dynamics of the Classical Car-Following Model (CCFM). Specifically, we analyze the CCFM in no delay, small delay and arbitrary delay regimes. First, we derive a sufficient condition for local stability of the CCFM in no-delay and small-delay regimes using. Next, we derive the necessary and sufficient condition for local stability of the CCFM for an arbitrary delay. We then demonstrate that the transition of traffic flow from the locally stable to the unstable regime occurs via a Hopf bifurcation, thus resulting in limit cycles in system dynamics. Physically, these limit cycles manifest as back-propagating congestion waves on highways. In the context of human-driven vehicles, our work provides phenomenological insight into the impact of reaction delays on the emergence and evolution of traffic congestion. In the context of self-driven vehicles, our work has the potential to provide design guidelines for control algorithms running in self-driven cars to avoid undesirable phenomena. Specifically, designing control algorithms that avoid jerky vehicular movements is essential. Hence, we derive the necessary and sufficient condition for non-oscillatory convergence of the CCFM. Next, we characterize the rate of convergence of the CCFM, and bring forth the interplay between local stability, non-oscillatory convergence and the rate of convergence of the CCFM. Further, to better understand the oscillations in the system dynamics, we characterize the type of the Hopf bifurcation and the asymptotic orbital stability of the limit cycles using Poincare normal forms and the center manifold theory. The analysis is complemented with stability charts, bifurcation diagrams and MATLAB simulations.

MLMay 12, 2022

A Survey of Risk-Aware Multi-Armed Bandits

Vincent Y. F. Tan, Prashanth L. A., Krishna Jagannathan

In several applications such as clinical trials and financial portfolio optimization, the expected value (or the average reward) does not satisfactorily capture the merits of a drug or a portfolio. In such applications, risk plays a crucial role, and a risk-aware performance measure is preferable, so as to capture losses in the case of adverse events. This survey aims to consolidate and summarise the existing research on risk measures, specifically in the context of multi-armed bandits. We review various risk measures of interest, and comment on their properties. Next, we review existing concentration inequalities for various risk measures. Then, we proceed to defining risk-aware bandit problems, We consider algorithms for the regret minimization setting, where the exploration-exploitation trade-off manifests, as well as the best-arm identification setting, which is a pure exploration problem -- both in the context of risk-sensitive measures. We conclude by commenting on persisting challenges and fertile areas for future research.

SYMay 24, 2018

Impact of delayed acceleration feedback on the classical car-following model

Gopal Krishna Kamath, Krishna Jagannathan, Gaurav Raina

Delayed feedback plays a vital role in determining the qualitative dynamical properties of a platoon of vehicles driving on a straight road. Motivated by the positive impact of Delayed Acceleration Feedback (DAF) in various scenarios, in this paper, we incorporate DAF into the Classical Car-Following Model (CCFM). We begin by deriving the Classical Car-Following Model with Delayed Acceleration Feedback (CCFM-DAF). We then derive the necessary and sufficient condition for local stability of the CCFM-DAF. Next, we show that the CCFM-DAF transits from the locally stable to the unstable regime via a Hopf bifurcation; thus leading to the emergence of limit cycles in system dynamics. We then propose a suitable linear transformation that enables us to analyze the local bifurcation properties of the CCFM-DAF by studying the analogous properties of the CCFM. We also study the impact of DAF on three important dynamical properties of the CCFM; namely, non-oscillatory convergence, string stability and robust stability. Our analyses are complemented with a stability chart and a bifurcation diagram. Our work reveals the following detrimental effects of DAF on the CCFM: (i) reduction in the locally stable region, (ii) increase in the frequency of the emergent limit cycles, (iii) decrease in the amplitude of the emergent limit cycles, (iv) destruction of the non-oscillatory property, (vi) increased risk of string instability, and (vii) reduced resilience towards parametric uncertainty. Thus, we report a practically-relevant application wherein DAF degrades the performance in several metrics of interest.

NIDec 24, 2018

Right buffer sizing matters: some dynamical and statistical studies on Compound TCP

Debayani Ghosh, Krishna Jagannathan, Gaurav Raina

Motivated by recent concerns that queuing delays in the Internet are on the rise, we conduct a performance evaluation of Compound TCP (C-TCP) in two topologies: a single bottleneck and a multi-bottleneck topology, under different traffic scenarios. The first topology consists of a single bottleneck router, and the second consists of two distinct sets of TCP flows, regulated by two edge routers, feeding into a common core router. We focus on some dynamical and statistical properties of the underlying system. From a dynamical perspective, we develop fluid models in a regime wherein the number of flows is large, bandwidth-delay product is high, buffers are dimensioned small (independent of the bandwidth-delay product) and routers deploy a Drop-Tail queue policy. A detailed local stability analysis for these models yields the following key insight: smaller buffers favour stability. Additionally, we highlight that larger buffers, in addition to increasing latency, are prone to inducing limit cycles in the system dynamics, via a Hopf bifurcation. These limit cycles in turn cause synchronisation among the TCP flows, and also result in a loss of link utilisation. For the topologies considered, we also empirically analyse some statistical properties of the bottleneck queues. These statistical analyses serve to validate an important modelling assumption: that in the regime considered, each bottleneck queue may be approximated as either an $M/M/1/B$ or an $M/D/1/B$ queue. This immediately makes the modelling perspective attractive and the analysis tractable. Finally, we show that smaller buffers, in addition to ensuring stability and low latency, would also yield fairly good system performance, in terms of throughput and flow completion times.

72.3ITMay 8

Learning to Transmit Over Unknown Erasure Channels with Empirical Erasure Rate Feedback

Haricharan Balasundaram, Krishna Jagannathan

We address the problem of reliable data transmission within a finite time horizon $T$ over a binary erasure channel with unknown erasure probability. We consider a feedback model wherein the transmitter can query the receiver infrequently and obtain the empirical erasure rate experienced by the latter. We aim to minimize a regret quantity, i.e. how much worse a strategy performs compared to an oracle who knows the probability of erasure, while operating at the same block error rate. A learning vs. exploitation dilemma manifests in this scenario -- specifically, we need to balance between (i) learning the erasure probability with reasonable accuracy and (ii) utilizing the channel to transmit as many information bits as possible. We propose two strategies: (i) a two-phase approach using rate estimation followed by transmission that achieves an $O({T}^{\frac 23})$ regret using only one query, and (ii) a windowing strategy using geometrically-increasing window sizes that achieves an $O({\sqrt{T}})$ regret using $O(\log(T))$ queries.

QUANT-PHJun 28, 2024

Classical Bandit Algorithms for Entanglement Detection in Parameterized Qubit States

Bharati. K, Vikesh Siddhu, Krishna Jagannathan

Entanglement is a key resource for a wide range of tasks in quantum information and computing. Thus, verifying availability of this quantum resource is essential. Extensive research on entanglement detection has led to no-go theorems (Lu et al. [Phys. Rev. Lett., 116, 230501 (2016)]) that highlight the need for full state tomography (FST) in the absence of adaptive or joint measurements. Recent advancements, as proposed by Zhu, Teo, and Englert [Phys. Rev. A, 81, 052339, 2010], introduce a single-parameter family of entanglement witness measurements which are capable of conclusively detecting certain entangled states and only resort to FST when all witness measurements are inconclusive. We find a variety of realistic noisy two-qubit quantum states $\mathcal{F}$ that yield conclusive results under this witness family. We solve the problem of detecting entanglement among $K$ quantum states in $\mathcal{F}$, of which $m$ states are entangled, with $m$ potentially unknown. We recognize a structural connection of this problem to the Bad Arm Identification problem in stochastic Multi-Armed Bandits (MAB). In contrast to existing quantum bandit frameworks, we establish a new correspondence tailored for entanglement detection and term it the $(m,K)$-quantum Multi-Armed Bandit. We implement two well-known MAB policies for arbitrary states derived from $\mathcal{F}$, present theoretical guarantees on the measurement/sample complexity and demonstrate the practicality of the policies through numerical simulations. More broadly, this paper highlights the potential for employing classical machine learning techniques for quantum entanglement detection.

LGJun 12, 2024

A Finite-Sample Analysis of an Actor-Critic Algorithm for Mean-Variance Optimization in a Discounted MDP

Tejaram Sangadi, L. A. Prashanth, Krishna Jagannathan

Motivated by applications in risk-sensitive reinforcement learning, we study mean-variance optimization in a discounted reward Markov Decision Process (MDP). Specifically, we analyze a Temporal Difference (TD) learning algorithm with linear function approximation (LFA) for policy evaluation. We derive finite-sample bounds that hold (i) in the mean-squared sense and (ii) with high probability under tail iterate averaging, both with and without regularization. Our bounds exhibit an exponentially decaying dependence on the initial error and a convergence rate of $O(1/t)$ after $t$ iterations. Moreover, for the regularized TD variant, our bound holds for a universal step size. Next, we integrate a Simultaneous Perturbation Stochastic Approximation (SPSA)-based actor update with an LFA critic and establish an $O(n^{-1/4})$ convergence guarantee, where $n$ denotes the iterations of the SPSA-based actor-critic algorithm. These results establish finite-sample theoretical guarantees for risk-sensitive actor-critic methods in reinforcement learning, with a focus on variance as a risk measure.

MLNov 16, 2021

Online Estimation and Optimization of Utility-Based Shortfall Risk

Vishwajit Hegde, Arvind S. Menon, L. A. Prashanth et al.

Utility-Based Shortfall Risk (UBSR) is a risk metric that is increasingly popular in financial applications, owing to certain desirable properties that it enjoys. We consider the problem of estimating UBSR in a recursive setting, where samples from the underlying loss distribution are available one-at-a-time. We cast the UBSR estimation problem as a root finding problem, and propose stochastic approximation-based estimations schemes. We derive non-asymptotic bounds on the estimation error in the number of samples. We also consider the problem of UBSR optimization within a parameterized class of random variables. We propose a stochastic gradient descent based algorithm for UBSR optimization, and derive non-asymptotic bounds on its convergence.

LGAug 28, 2020

Statistically Robust, Risk-Averse Best Arm Identification in Multi-Armed Bandits

Anmol Kagrecha, Jayakrishnan Nair, Krishna Jagannathan

Traditional multi-armed bandit (MAB) formulations usually make certain assumptions about the underlying arms' distributions, such as bounds on the support or their tail behaviour. Moreover, such parametric information is usually 'baked' into the algorithms. In this paper, we show that specialized algorithms that exploit such parametric information are prone to inconsistent learning performance when the parameter is misspecified. Our key contributions are twofold: (i) We establish fundamental performance limits of statistically robust MAB algorithms under the fixed-budget pure exploration setting, and (ii) We propose two classes of algorithms that are asymptotically near-optimal. Additionally, we consider a risk-aware criterion for best arm identification, where the objective associated with each arm is a linear combination of the mean and the conditional value at risk (CVaR). Throughout, we make a very mild 'bounded moment' assumption, which lets us work with both light-tailed and heavy-tailed distributions within a unified framework.

LGJun 22, 2020

Bandit algorithms: Letting go of logarithmic regret for statistical robustness

Kumar Ashutosh, Jayakrishnan Nair, Anmol Kagrecha et al.

We study regret minimization in a stochastic multi-armed bandit setting and establish a fundamental trade-off between the regret suffered under an algorithm, and its statistical robustness. Considering broad classes of underlying arms' distributions, we show that bandit learning algorithms with logarithmic regret are always inconsistent and that consistent learning algorithms always suffer a super-logarithmic regret. This result highlights the inevitable statistical fragility of all `logarithmic regret' bandit algorithms available in the literature---for instance, if a UCB algorithm designed for $σ$-subGaussian distributions is used in a subGaussian setting with a mismatched variance parameter, the learning performance could be inconsistent. Next, we show a positive result: statistically robust and consistent learning performance is attainable if we allow the regret to be slightly worse than logarithmic. Specifically, we propose three classes of distribution oblivious algorithms that achieve an asymptotic regret that is arbitrarily close to logarithmic.

LGJun 17, 2020

Constrained regret minimization for multi-criterion multi-armed bandits

Anmol Kagrecha, Jayakrishnan Nair, Krishna Jagannathan

We consider a stochastic multi-armed bandit setting and study the problem of constrained regret minimization over a given time horizon. Each arm is associated with an unknown, possibly multi-dimensional distribution, and the merit of an arm is determined by several, possibly conflicting attributes. The aim is to optimize a 'primary' attribute subject to user-provided constraints on other 'secondary' attributes. We assume that the attributes can be estimated using samples from the arms' distributions, and that the estimators enjoy suitable concentration properties. We propose an algorithm called Con-LCB that guarantees a logarithmic regret, i.e., the average number of plays of all non-optimal arms is at most logarithmic in the horizon. The algorithm also outputs a Boolean flag that correctly identifies, with high probability, whether the given instance is feasible/infeasible with respect to the constraints. We also show that Con-LCB is optimal within a universal constant, i.e., that more sophisticated algorithms cannot do much better universally. Finally, we establish a fundamental trade-off between regret minimization and feasibility identification. Our framework finds natural applications, for instance, in financial portfolio optimization, where risk constrained maximization of expected return is meaningful.

LGNov 20, 2019

A Framework for End-to-End Deep Learning-Based Anomaly Detection in Transportation Networks

Neema Davis, Gaurav Raina, Krishna Jagannathan

We develop an end-to-end deep learning-based anomaly detection model for temporal data in transportation networks. The proposed EVT-LSTM model is derived from the popular LSTM (Long Short-Term Memory) network and adopts an objective function that is based on fundamental results from EVT (Extreme Value Theory). We compare the EVT-LSTM model with some established statistical, machine learning, and hybrid deep learning baselines. Experiments on seven diverse real-world data sets demonstrate the superior anomaly detection performance of our proposed model over the other models considered in the comparison study.

LGSep 13, 2019

LSTM-Based Anomaly Detection: Detection Rules from Extreme Value Theory

Neema Davis, Gaurav Raina, Krishna Jagannathan

In this paper, we explore various statistical techniques for anomaly detection in conjunction with the popular Long Short-Term Memory (LSTM) deep learning model for transportation networks. We obtain the prediction errors from an LSTM model, and then apply three statistical models based on (i) the Gaussian distribution, (ii) Extreme Value Theory (EVT), and (iii) the Tukey's method. Using statistical tests and numerical studies, we find strong evidence against the widely employed Gaussian distribution based detection rule on the prediction errors. Next, motivated by fundamental results from Extreme Value Theory, we propose a detection technique that does not assume any parent distribution on the prediction errors. Through numerical experiments conducted on several real-world traffic data sets, we show that the EVT-based detection rule is superior to other detection rules, and is supported by statistical evidence.

LGJun 3, 2019

Distribution oblivious, risk-aware algorithms for multi-armed bandits with unbounded rewards

Anmol Kagrecha, Jayakrishnan Nair, Krishna Jagannathan

Classical multi-armed bandit problems use the expected value of an arm as a metric to evaluate its goodness. However, the expected value is a risk-neutral metric. In many applications like finance, one is interested in balancing the expected return of an arm (or portfolio) with the risk associated with that return. In this paper, we consider the problem of selecting the arm that optimizes a linear combination of the expected reward and the associated Conditional Value at Risk (CVaR) in a fixed budget best-arm identification framework. We allow the reward distributions to be unbounded or even heavy-tailed. For this problem, our goal is to devise algorithms that are entirely distribution oblivious, i.e., the algorithm is not aware of any information on the reward distributions, including bounds on the moments/tails, or the suboptimality gaps across arms. In this paper, we provide a class of such algorithms with provable upper bounds on the probability of incorrect identification. In the process, we develop a novel estimator for the CVaR of unbounded (including heavy-tailed) random variables and prove a concentration inequality for the same, which could be of independent interest. We also compare the error bounds for our distribution oblivious algorithms with those corresponding to standard non-oblivious algorithms. Finally, numerical experiments reveal that our algorithms perform competitively when compared with non-oblivious algorithms, suggesting that distribution obliviousness can be realised in practice without incurring a significant loss of performance.

LGFeb 18, 2019

Grids versus Graphs: Partitioning Space for Improved Taxi Demand-Supply Forecasts

Neema Davis, Gaurav Raina, Krishna Jagannathan

Accurate taxi demand-supply forecasting is a challenging application of ITS (Intelligent Transportation Systems), due to the complex spatial and temporal patterns. We investigate the impact of different spatial partitioning techniques on the prediction performance of an LSTM (Long Short-Term Memory) network, in the context of taxi demand-supply forecasting. We consider two tessellation schemes: (i) the variable-sized Voronoi tessellation, and (ii) the fixed-sized Geohash tessellation. While the widely employed ConvLSTM (Convolutional LSTM) can model fixed-sized Geohash partitions, the standard convolutional filters cannot be applied on the variable-sized Voronoi partitions. To explore the Voronoi tessellation scheme, we propose the use of GraphLSTM (Graph-based LSTM), by representing the Voronoi spatial partitions as nodes on an arbitrarily structured graph. The GraphLSTM offers competitive performance against ConvLSTM, at lower computational complexity, across three real-world large-scale taxi demand-supply data sets, with different performance metrics. To ensure superior performance across diverse settings, a HEDGE based ensemble learning algorithm is applied over the ConvLSTM and the GraphLSTM networks.

LGJan 4, 2019

Concentration bounds for CVaR estimation: The cases of light-tailed and heavy-tailed distributions

Prashanth L. A., Krishna Jagannathan, Ravi Kumar Kolla

Conditional Value-at-Risk (CVaR) is a widely used risk metric in applications such as finance. We derive concentration bounds for CVaR estimates, considering separately the cases of light-tailed and heavy-tailed distributions. In the light-tailed case, we use a classical CVaR estimator based on the empirical distribution constructed from the samples. For heavy-tailed random variables, we assume a mild `bounded moment' condition, and derive a concentration bound for a truncation-based estimator. Notably, our concentration bounds enjoy an exponential decay in the sample size, for heavy-tailed as well as light-tailed distributions. To demonstrate the applicability of our concentration results, we consider a CVaR optimization problem in a multi-armed bandit setting. Specifically, we address the best CVaR-arm identification problem under a fixed budget. We modify the well-known successive rejects algorithm to incorporate a CVaR-based criterion. Using the CVaR concentration result, we derive an upper-bound on the probability of incorrect identification by the proposed algorithm.

LGDec 10, 2018

Taxi Demand-Supply Forecasting: Impact of Spatial Partitioning on the Performance of Neural Networks

Neema Davis, Gaurav Raina, Krishna Jagannathan

In this paper, we investigate the significance of choosing an appropriate tessellation strategy for a spatio-temporal taxi demand-supply modeling framework. Our study compares (i) the variable-sized polygon based Voronoi tessellation, and (ii) the fixed-sized grid based Geohash tessellation, using taxi demand-supply GPS data for the cities of Bengaluru, India and New York, USA. Long Short-Term Memory (LSTM) networks are used for modeling and incorporating information from spatial neighbors into the model. We find that the LSTM model based on input features extracted from a variable-sized polygon tessellation yields superior performance over the LSTM model based on fixed-sized grid tessellation. Our study highlights the need to explore multiple spatial partitioning techniques for improving the prediction performance in neural network models.

LGAug 6, 2018

Concentration bounds for empirical conditional value-at-risk: The unbounded case

Ravi Kumar Kolla, Prashanth L. A., Sanjay P. Bhat et al.

In several real-world applications involving decision making under uncertainty, the traditional expected value objective may not be suitable, as it may be necessary to control losses in the case of a rare but extreme event. Conditional Value-at-Risk (CVaR) is a popular risk measure for modeling the aforementioned objective. We consider the problem of estimating CVaR from i.i.d. samples of an unbounded random variable, which is either sub-Gaussian or sub-exponential. We derive a novel one-sided concentration bound for a natural sample-based CVaR estimator in this setting. Our bound relies on a concentration result for a quantile-based estimator for Value-at-Risk (VaR), which may be of independent interest.

LGMay 17, 2018

Taxi demand forecasting: A HEDGE based tessellation strategy for improved accuracy

Neema Davis, Gaurav Raina, Krishna Jagannathan

A key problem in location-based modeling and forecasting lies in identifying suitable spatial and temporal resolutions. In particular, judicious spatial partitioning can play a significant role in enhancing the performance of location-based forecasting models. In this work, we investigate two widely used tessellation strategies for partitioning city space, in the context of real-time taxi demand forecasting. Our study compares (i) Geohash tessellation, and (ii) Voronoi tessellation, using two distinct taxi demand datasets, over multiple time scales. For the purpose of comparison, we employ classical time-series tools to model the spatio-temporal demand. Our study finds that the performance of each tessellation strategy is highly dependent on the city geography, spatial distribution of the data, and the time of the day, and that neither strategy is found to perform optimally across the forecast horizon. We propose a hybrid tessellation algorithm that picks the best tessellation strategy at each instant, based on their performance in the recent past. Our hybrid algorithm is a non-stationary variant of the well-known HEDGE algorithm for choosing the best advice from multiple experts. We show that the hybrid tessellation strategy performs consistently better than either of the two strategies across the data sets considered, at multiple time scales, and with different performance metrics. We achieve an average accuracy of above 80% per km^2 for both data sets considered at 60 minute aggregation levels.

SYJul 20, 2017

The Modified Optimal Velocity Model: Stability Analyses and Design Guidelines

Gopal Krishna Kamath, Krishna Jagannathan, Gaurav Raina

Reaction delays are important in determining the qualitative dynamical properties of a platoon of vehicles traveling on a straight road. In this paper, we investigate the impact of delayed feedback on the dynamics of the Modified Optimal Velocity Model (MOVM). Specifically, we analyze the MOVM in three regimes -- no delay, small delay and arbitrary delay. In the absence of reaction delays, we show that the MOVM is locally stable. For small delays, we then derive a sufficient condition for the MOVM to be locally stable. Next, for an arbitrary delay, we derive the necessary and sufficient condition for the local stability of the MOVM. We show that the traffic flow transits from the locally stable to the locally unstable regime via a Hopf bifurcation. We also derive the necessary and sufficient condition for non-oscillatory convergence and characterize the rate of convergence of the MOVM. These conditions help ensure smooth traffic flow, good ride quality and quick equilibration to the uniform flow. Further, since a Hopf bifurcation results in the emergence of limit cycles, we provide an analytical framework to characterize the type of the Hopf bifurcation and the asymptotic orbital stability of the resulting non-linear oscillations. Finally, we corroborate our analyses using stability charts, bifurcation diagrams, numerical computations and simulations conducted using MATLAB.

LGNov 30, 2016

Bandit algorithms to emulate human decision making using probabilistic distortions

Ravi Kumar Kolla, Prashanth L. A., Aditya Gopalan et al.

Motivated by models of human decision making proposed to explain commonly observed deviations from conventional expected value preferences, we formulate two stochastic multi-armed bandit problems with distorted probabilities on the reward distributions: the classic $K$-armed bandit and the linearly parameterized bandit settings. We consider the aforementioned problems in the regret minimization as well as best arm identification framework for multi-armed bandits. For the regret minimization setting in $K$-armed as well as linear bandit problems, we propose algorithms that are inspired by Upper Confidence Bound (UCB) algorithms, incorporate reward distortions, and exhibit sublinear regret. For the $K$-armed bandit setting, we derive an upper bound on the expected regret for our proposed algorithm, and then we prove a matching lower bound to establish the order-optimality of our algorithm. For the linearly parameterized setting, our algorithm achieves a regret upper bound that is of the same order as that of regular linear bandit algorithm called Optimism in the Face of Uncertainty Linear (OFUL) bandit algorithm, and unlike OFUL, our algorithm handles distortions and an arm-dependent noise model. For the best arm identification problem in the $K$-armed bandit setting, we propose algorithms, derive guarantees on their performance, and also show that these algorithms are order optimal by proving matching fundamental limits on performance. For best arm identification in linear bandits, we propose an algorithm and establish sample complexity guarantees. Finally, we present simulation experiments which demonstrate the advantages resulting from using distortion-aware learning algorithms in a vehicular traffic routing application.

LGFeb 29, 2016

Collaborative Learning of Stochastic Bandits over a Social Network

Ravi Kumar Kolla, Krishna Jagannathan, Aditya Gopalan

We consider a collaborative online learning paradigm, wherein a group of agents connected through a social network are engaged in playing a stochastic multi-armed bandit game. Each time an agent takes an action, the corresponding reward is instantaneously observed by the agent, as well as its neighbours in the social network. We perform a regret analysis of various policies in this collaborative learning setting. A key finding of this paper is that natural extensions of widely-studied single agent learning policies to the network setting need not perform well in terms of regret. In particular, we identify a class of non-altruistic and individually consistent policies, and argue by deriving regret lower bounds that they are liable to suffer a large regret in the networked setting. We also show that the learning performance can be substantially improved if the agents exploit the structure of the network, and develop a simple learning algorithm based on dominating sets of the network. Specifically, we first consider a star network, which is a common motif in hierarchical social networks, and show analytically that the hub agent can be used as an information sink to expedite learning and improve the overall regret. We also derive networkwide regret bounds for the algorithm applied to general networks. We conduct numerical experiments on a variety of networks to corroborate our analytical results.