57.1NIMay 13
Optimal Oblivious Load-Balancing for Sparse Traffic in Large-Scale Satellite NetworksRudrapatna Vallabh Ramakanth, Eytan Modiano
Oblivious load-balancing in networks involves routing traffic from sources to destinations using predetermined routes independent of the traffic, so that the maximum load on any link in the network is minimized. We investigate oblivious load-balancing schemes for a $N\times N$ torus network under sparse traffic where there are at most $k$ active source-destination pairs. We are motivated by the problem of load-balancing in large-scale LEO satellite networks, which can be modelled as a torus, where the traffic is known to be sparse and localized to certain hotspot areas. We formulate the problem as a linear program and show that no oblivious routing scheme can achieve a worst-case load lower than approximately $\frac{\sqrt{2k}}{4}$ when $1<k \leq N^2/2$ and $\frac{N}{4}$ when $N^2/2\leq k\leq N^2$. Moreover, we demonstrate that the celebrated Valiant Load Balancing scheme is suboptimal under sparse traffic and construct an optimal oblivious load-balancing scheme that achieves the lower bound. Further, we discover a $\sqrt{2}$ multiplicative gap between the worst-case load of a non-oblivious routing and the worst-case load of any oblivious routing. The results can also be extended to general $N\times M$ tori with unequal link capacities along the vertical and horizontal directions.
NIAug 4, 2023
Learning to Schedule in Non-Stationary Wireless Networks With Unknown StatisticsQuang Minh Nguyen, Eytan Modiano
The emergence of large-scale wireless networks with partially-observable and time-varying dynamics has imposed new challenges on the design of optimal control policies. This paper studies efficient scheduling algorithms for wireless networks subject to generalized interference constraint, where mean arrival and mean service rates are unknown and non-stationary. This model exemplifies realistic edge devices' characteristics of wireless communication in modern networks. We propose a novel algorithm termed MW-UCB for generalized wireless network scheduling, which is based on the Max-Weight policy and leverages the Sliding-Window Upper-Confidence Bound to learn the channels' statistics under non-stationarity. MW-UCB is provably throughput-optimal under mild assumptions on the variability of mean service rates. Specifically, as long as the total variation in mean service rates over any time period grows sub-linearly in time, we show that MW-UCB can achieve the stability region arbitrarily close to the stability region of the class of policies with full knowledge of the channel statistics. Extensive simulations validate our theoretical results and demonstrate the favorable performance of MW-UCB.
SYMay 3, 2016
Distributed Frequency Control in Power Grids Under Limited CommunicationMarzieh Parandehgheibi, Konstantin Turitsyn, Eytan Modiano
In this paper, we analyze the impact of communication failures on the performance of optimal distributed frequency control. We consider a consensus-based control scheme, and show that it does not converge to the optimal solution when the communication network is disconnected. We propose a new control scheme that uses the dynamics of power grid to replicate the information not received from the communication network, and prove that it achieves the optimal solution under any single communication link failure. In addition, we show that this control improves cost under multiple communication link failures. Next, we analyze the impact of discrete-time communication on the performance of distributed frequency control. In particular, we will show that the convergence time increases as the time interval between two messages increases. We propose a new algorithm that uses the dynamics of the power grid, and show through simulation that it improves the convergence time of the control scheme significantly.
75.5NIApr 19
Scheduling in Multi-Hop Wireless Networks With DeadlinesNicholas Jones, Eytan Modiano
We analyze the problem of scheduling in wireless networks to meet end-to-end service guarantees, defined by instantaneous throughput and hard packet deadlines. Using a network slicing model to decouple the queueing dynamics between flows, we show that the network's ability to meet hard deadline guarantees under interference is largely influenced by the link scheduling policy. We characterize throughput- and deadline-optimal policies for a solitary flow operating in isolation, which provide bounds on feasibility in the general case with multiple flows. We prove that packet delays can grow arbitrarily large in the multi-flow setting under a worst-case stabilizing policy, showing that queue stability is not sufficient to guarantee tight deadlines. We derive conditions on end-to-end packet delays in terms of link inter-scheduling times, and show that it is possible to make hard guarantees under any interference model by solving a generalized version of the pinwheel scheduling problem. Finally, we introduce a decentralized polynomial-time algorithm which can meet tight end-to-end packet deadlines while achieving near-optimal throughput.
SYJul 24, 2012
Delay Stability Regions of the Max-Weight Policy under Heavy-Tailed TrafficMihalis G. Markakis, Eytan Modiano, John N. Tsitsiklis
We carry out a delay stability analysis (i.e., determine conditions under which expected steady-state delays at a queue are finite) for a simple 3-queue system operated under the Max-Weight scheduling policy, for the case where one of the queues is fed by heavy-tailed traffic (i.e, when the number of arrivals at each time slot has infinite second moment). This particular system exemplifies an intricate phenomenon whereby heavy-tailed traffic at one queue may or may not result in the delay instability of another queue, depending on the arrival rates. While the ordinary stability region (in the sense of convergence to a steady-state distribution) is straightforward to determine, the determination of the delay stability region is more involved: (i) we use "fluid-type" sample path arguments, combined with renewal theory, to prove delay instability outside a certain region; (ii) we use a piecewise linear Lyapunov function to prove delay stability in the interior of that same region; (iii) as an intermediate step in establishing delay stability, we show that the expected workload of a stable M/GI/1 queue scales with time as $\mathcal{O}(t^{1/(1+γ)})$, assuming that service times have a finite $1+γ$ moment, where $γ\in (0,1)$.
SYApr 24, 2024
Power Failure Cascade Prediction using Graph Neural NetworksSathwik Chadaga, Xinyu Wu, Eytan Modiano
We consider the problem of predicting power failure cascades due to branch failures. We propose a flow-free model based on graph neural networks that predicts grid states at every generation of a cascade process given an initial contingency and power injection values. We train the proposed model using a cascade sequence data pool generated from simulations. We then evaluate our model at various levels of granularity. We present several error metrics that gauge the model's ability to predict the failure size, the final grid state, and the failure time steps of each branch within the cascade. We benchmark the graph neural network model against influence models. We show that, in addition to being generic over randomly scaled power injection values, the graph neural network model outperforms multiple influence models that are built specifically for their corresponding loading profiles. Finally, we show that the proposed model reduces the computational time by almost two orders of magnitude.
AIApr 5, 2024
Intervention-Assisted Policy Gradient Methods for Online Stochastic Queuing Network Optimization: Technical ReportJerrod Wigmore, Brooke Shrader, Eytan Modiano
Deep Reinforcement Learning (DRL) offers a powerful approach to training neural network control policies for stochastic queuing networks (SQN). However, traditional DRL methods rely on offline simulations or static datasets, limiting their real-world application in SQN control. This work proposes Online Deep Reinforcement Learning-based Controls (ODRLC) as an alternative, where an intelligent agent interacts directly with a real environment and learns an optimal control policy from these online interactions. SQNs present a challenge for ODRLC due to the unbounded nature of the queues within the network resulting in an unbounded state-space. An unbounded state-space is particularly challenging for neural network policies as neural networks are notoriously poor at extrapolating to unseen states. To address this challenge, we propose an intervention-assisted framework that leverages strategic interventions from known stable policies to ensure the queue sizes remain bounded. This framework combines the learning power of neural networks with the guaranteed stability of classical control policies for SQNs. We introduce a method to design these intervention-assisted policies to ensure strong stability of the network. Furthermore, we extend foundational DRL theorems for intervention-assisted policies and develop two practical algorithms specifically for ODRLC of SQNs. Finally, we demonstrate through experiments that our proposed algorithms outperform both classical control approaches and prior ODRLC algorithms.
LGJan 19, 2025
A Novel Switch-Type Policy Network for Resource Allocation Problems: Technical ReportJerrod Wigmore, Brooke Shrader, Eytan Modiano
Deep Reinforcement Learning (DRL) has become a powerful tool for developing control policies in queueing networks, but the common use of Multi-layer Perceptron (MLP) neural networks in these applications has significant drawbacks. MLP architectures, while versatile, often suffer from poor sample efficiency and a tendency to overfit training environments, leading to suboptimal performance on new, unseen networks. In response to these issues, we introduce a switch-type neural network (STN) architecture designed to improve the efficiency and generalization of DRL policies in queueing networks. The STN leverages structural patterns from traditional non-learning policies, ensuring consistent action choices across similar states. This design not only streamlines the learning process but also fosters better generalization by reducing the tendency to overfit. Our works presents three key contributions: first, the development of the STN as a more effective alternative to MLPs; second, empirical evidence showing that STNs achieve superior sample efficiency in various training scenarios; and third, experimental results demonstrating that STNs match MLP performance in familiar environments and significantly outperform them in new settings. By embedding domain-specific knowledge, the STN enhances the Proximal Policy Optimization (PPO) algorithm's effectiveness without compromising performance, suggesting its suitability for a wide range of queueing network control problems.
NIAug 6, 2021
Computation and Communication Co-Design for Real-Time Monitoring and Control in Multi-Agent SystemsVishrant Tripathi, Luca Ballotta, Luca Carlone et al.
We investigate the problem of co-designing computation and communication in a multi-agent system (e.g. a sensor network or a multi-robot team). We consider the realistic setting where each agent acquires sensor data and is capable of local processing before sending updates to a base station, which is in charge of making decisions or monitoring phenomena of interest in real time. Longer processing at an agent leads to more informative updates but also larger delays, giving rise to a delay-accuracy-tradeoff in choosing the right amount of local processing at each agent. We assume that the available communication resources are limited due to interference, bandwidth, and power constraints. Thus, a scheduling policy needs to be designed to suitably share the communication channel among the agents. To that end, we develop a general formulation to jointly optimize the local processing at the agents and the scheduling of transmissions. Our novel formulation leverages the notion of Age of Information to quantify the freshness of data and capture the delays caused by computation and communication. We develop efficient resource allocation algorithms using the Whittle index approach and demonstrate our proposed algorithms in two practical applications: multi-agent occupancy grid mapping in time-varying environments, and ride sharing in autonomous vehicle networks. Our experiments show that the proposed co-design approach leads to a substantial performance improvement (18-82% in our tests).
NIMay 27, 2021
An Online Learning Approach to Optimizing Time-Varying Costs of AoIVishrant Tripathi, Eytan Modiano
We consider systems that require timely monitoring of sources over a communication network, where the cost of delayed information is unknown, time-varying and possibly adversarial. For the single source monitoring problem, we design algorithms that achieve sublinear regret compared to the best fixed policy in hindsight. For the multiple source scheduling problem, we design a new online learning algorithm called Follow-the-Perturbed-Whittle-Leader and show that it has low regret compared to the best fixed scheduling policy in hindsight, while remaining computationally feasible. The algorithm and its regret analysis are novel and of independent interest to the study of online restless multi-armed bandit problems. We further design algorithms that achieve sublinear regret compared to the best dynamic policy when the environment is slowly varying. Finally, we apply our algorithms to a mobility tracking problem. We consider non-stationary and adversarial mobility models and illustrate the performance benefit of using our online learning algorithms compared to an oblivious scheduling policy.
NIDec 16, 2020
Learning-NUM: Network Utility Maximization with Unknown Utility Functions and Queueing DelayXinzhe Fu, Eytan Modiano
Network Utility Maximization (NUM) studies the problems of allocating traffic rates to network users in order to maximize the users' total utility subject to network resource constraints. In this paper, we propose a new NUM framework, Learning-NUM, where the users' utility functions are unknown apriori and the utility function values of the traffic rates can be observed only after the corresponding traffic is delivered to the destination, which means that the utility feedback experiences \textit{queueing delay}. The goal is to design a policy that gradually learns the utility functions and makes rate allocation and network scheduling/routing decisions so as to maximize the total utility obtained over a finite time horizon $T$. In addition to unknown utility functions and stochastic constraints, a central challenge of our problem lies in the queueing delay of the observations, which may be unbounded and depends on the decisions of the policy. We first show that the expected total utility obtained by the best dynamic policy is upper bounded by the solution to a static optimization problem. Without the presence of feedback delay, we design an algorithm based on the ideas of gradient estimation and Max-Weight scheduling. To handle the feedback delay, we embed the algorithm in a parallel-instance paradigm to form a policy that achieves $\tilde{O}(T^{3/4})$-regret, i.e., the difference between the expected utility obtained by the best dynamic policy and our policy is in $\tilde{O}(T^{3/4})$. Finally, to demonstrate the practical applicability of the Learning-NUM framework, we apply it to three application scenarios including database query, job scheduling and video streaming. We further conduct simulations on the job scheduling application to evaluate the empirical performance of our policy.
SYDec 16, 2020
Aging Bandits: Regret Analysis and Order-Optimal Learning Algorithm for Wireless Networks with Stochastic ArrivalsEray Unsal Atay, Igor Kadota, Eytan Modiano
We consider a single-hop wireless network with sources transmitting time-sensitive information to the destination over multiple unreliable channels. Packets from each source are generated according to a stochastic process with known statistics and the state of each wireless channel (ON/OFF) varies according to a stochastic process with unknown statistics. The reliability of the wireless channels is to be learned through observation. At every time slot, the learning algorithm selects a single pair (source, channel) and the selected source attempts to transmit its packet via the selected channel. The probability of a successful transmission to the destination depends on the reliability of the selected channel. The goal of the learning algorithm is to minimize the Age-of-Information (AoI) in the network over $T$ time slots. To analyze the performance of the learning algorithm, we introduce the notion of AoI regret, which is the difference between the expected cumulative AoI of the learning algorithm under consideration and the expected cumulative AoI of a genie algorithm that knows the reliability of the channels a priori. The AoI regret captures the penalty incurred by having to learn the statistics of the channels over the $T$ time slots. The results are two-fold: first, we consider learning algorithms that employ well-known solutions to the stochastic multi-armed bandit problem (such as $ε$-Greedy, Upper Confidence Bound, and Thompson Sampling) and show that their AoI regret scales as $Θ(\log T)$; second, we develop a novel learning algorithm and show that it has $O(1)$ regret. To the best of our knowledge, this is the first learning algorithm with bounded AoI regret.
PFNov 14, 2020
RL-QN: A Reinforcement Learning Framework for Optimal Control of Queueing SystemsBai Liu, Qiaomin Xie, Eytan Modiano
With the rapid advance of information technology, network systems have become increasingly complex and hence the underlying system dynamics are often unknown or difficult to characterize. Finding a good network control policy is of significant importance to achieve desirable network performance (e.g., high throughput or low delay). In this work, we consider using model-based reinforcement learning (RL) to learn the optimal control policy for queueing networks so that the average job delay (or equivalently the average queue backlog) is minimized. Traditional approaches in RL, however, cannot handle the unbounded state spaces of the network control problem. To overcome this difficulty, we propose a new algorithm, called Reinforcement Learning for Queueing Networks (RL-QN), which applies model-based RL methods over a finite subset of the state space, while applying a known stabilizing policy for the rest of the states. We establish that the average queue backlog under RL-QN with an appropriately constructed subset can be arbitrarily close to the optimal result. We evaluate RL-QN in dynamic server allocation, routing and switching problems. Simulation results show that RL-QN minimizes the average queue backlog effectively.
PFMay 11, 2020
Learning Algorithms for Minimizing Queue Length RegretThomas Stahlbuhk, Brooke Shrader, Eytan Modiano
We consider a system consisting of a single transmitter/receiver pair and $N$ channels over which they may communicate. Packets randomly arrive to the transmitter's queue and wait to be successfully sent to the receiver. The transmitter may attempt a frame transmission on one channel at a time, where each frame includes a packet if one is in the queue. For each channel, an attempted transmission is successful with an unknown probability. The transmitter's objective is to quickly identify the best channel to minimize the number of packets in the queue over $T$ time slots. To analyze system performance, we introduce queue length regret, which is the expected difference between the total queue length of a learning policy and a controller that knows the rates, a priori. One approach to designing a transmission policy would be to apply algorithms from the literature that solve the closely-related stochastic multi-armed bandit problem. These policies would focus on maximizing the number of successful frame transmissions over time. However, we show that these methods have $Ω(\log{T})$ queue length regret. On the other hand, we show that there exists a set of queue-length based policies that can obtain order optimal $O(1)$ queue length regret. We use our theoretical analysis to devise heuristic methods that are shown to perform well in simulation.
MLSep 24, 2019
A Theory of Uncertainty Variables for State Estimation and InferenceRajat Talak, Sertac Karaman, Eytan Modiano
We develop a new framework of uncertainty variables to model uncertainty. An uncertainty variable is characterized by an uncertainty set, in which its realization is bound to lie, while the conditional uncertainty is characterized by a set map, from a given realization of a variable to a set of possible realizations of another variable. We prove Bayes' law and the law of total probability equivalents for uncertainty variables. We define a notion of independence, conditional independence, and pairwise independence for a collection of uncertainty variables, and show that this new notion of independence preserves the properties of independence defined over random variables. We then develop a graphical model, namely Bayesian uncertainty network, a Bayesian network equivalent defined over a collection of uncertainty variables, and show that all the natural conditional independence properties, expected out of a Bayesian network, hold for the Bayesian uncertainty network. We also define the notion of point estimate, and show its relation with the maximum a posteriori estimate. Probability theory starts with a distribution function (equivalently a probability measure) as a primitive and builds all other useful concepts, such as law of total probability, Bayes' law, independence, graphical models, point estimate, on it. Our work shows that it is perfectly possible to start with a set, instead of a distribution function, and retain all the useful ideas needed for state estimation and inference.
LGOct 24, 2018
Learning to Route Efficiently with End-to-End Feedback: The Value of Networked StructureRuihao Zhu, Eytan Modiano
We introduce efficient algorithms which achieve nearly optimal regrets for the problem of stochastic online shortest path routing with end-to-end feedback. The setting is a natural application of the combinatorial stochastic bandits problem, a special case of the linear stochastic bandits problem. We show how the difficulties posed by the large scale action set can be overcome by the networked structure of the action set. Our approach presents a novel connection between bandit learning and shortest path algorithms. Our main contribution is an adaptive exploration algorithm with nearly optimal instance-dependent regret for any directed acyclic network. We then modify it so that nearly optimal worst case regret is achieved simultaneously. Driven by the carefully designed Top-Two Comparison (TTC) technique, the algorithms are efficiently implementable. We further conduct extensive numerical experiments to show that our proposed algorithms not only achieve superior regret performances, but also reduce the runtime drastically.
LGJun 4, 2018
Data-driven Localization and Estimation of Disturbance in the Interconnected Power SystemHyang-Won Lee, Jianan Zhang, Eytan Modiano
Identifying the location of a disturbance and its magnitude is an important component for stable operation of power systems. We study the problem of localizing and estimating a disturbance in the interconnected power system. We take a model-free approach to this problem by using frequency data from generators. Specifically, we develop a logistic regression based method for localization and a linear regression based method for estimation of the magnitude of disturbance. Our model-free approach does not require the knowledge of system parameters such as inertia constants and topology, and is shown to achieve highly accurate localization and estimation performance even in the presence of measurement noise and missing data.
AIFeb 19, 2018
Accelerated Primal-Dual Policy Optimization for Safe Reinforcement LearningQingkai Liang, Fanyu Que, Eytan Modiano
Constrained Markov Decision Process (CMDP) is a natural framework for reinforcement learning tasks with safety constraints, where agents learn a policy that maximizes the long-term reward while satisfying the constraints on the long-term cost. A canonical approach for solving CMDPs is the primal-dual method which updates parameters in primal and dual spaces in turn. Existing methods for CMDPs only use on-policy data for dual updates, which results in sample inefficiency and slow convergence. In this paper, we propose a policy search method for CMDPs called Accelerated Primal-Dual Optimization (APDO), which incorporates an off-policy trained dual variable in the dual update procedure while updating the policy in primal space with on-policy likelihood ratio gradient. Experimental results on a simulated robot locomotion task show that APDO achieves better sample efficiency and faster convergence than state-of-the-art approaches for CMDPs.
NISep 6, 2017
Throughput Optimal Decentralized Scheduling of Multi-Hop Networks with End-to-End Deadline Constraints: II Wireless Networks with InterferenceRahul Singh, P. R. Kumar, Eytan Modiano
Consider a multihop wireless network serving multiple flows in which wireless link interference constraints are described by a link interference graph. For such a network, we design routing-scheduling policies that maximize the end-to-end timely throughput of the network. Timely throughput of a flow $f$ is defined as the average rate at which packets of flow $f$ reach their destination node $d_f$ within their deadline. Our policy has several surprising characteristics. Firstly, we show that the optimal routing-scheduling decision for an individual packet that is present at a wireless node $i\in V$ is solely a function of its location, and "age". Thus, a wireless node $i$ does not require the knowledge of the "global" network state in order to maximize the timely throughput. We notice that in comparison, under the backpressure routing policy, a node $i$ requires only the knowledge of its neighbours queue lengths in order to guarantee maximal stability, and hence is decentralized. The key difference arises due to the fact that in our set-up the packets loose their utility once their "age" has crossed their deadline, thus making the task of optimizing timely throughput much more challenging than that of ensuring network stability. Of course, due to this key difference, the decision process involved in maximizing the timely throughput is also much more complex than that involved in ensuring network-wide queue stabilization. In view of this, our results are somewhat surprising.
SYSep 8, 2017
Risk-Sensitive Optimal Control of QueuesRahul Singh, Xueying Guo, Eytan Modiano
We consider the problem of designing risk-sensitive optimal control policies for scheduling packet transmissions in a stochastic wireless network. A single client is connected to an access point (AP) through a wireless channel. Packet transmission incurs a cost $C$, while packet delivery yields a reward of $R$ units. The client maintains a finite buffer of size $B$, and a penalty of $L$ units is imposed upon packet loss which occurs due to finite queueing buffer. We show that the risk-sensitive optimal control policy for such a simple set-up is of threshold type, i.e., it is optimal to carry out packet transmissions only when $Q(t)$, i.e., the queue length at time $t$ exceeds a certain threshold $τ$. It is also shown that the value of threshold $τ$ increases upon increasing the cost per unit packet transmission $C$. Furthermore, it is also shown that a threshold policy with threshold equal to $τ$ is optimal for a set of problems in which cost $C$ lies within an interval $[C_l,C_u]$. Equations that need to be solved in order to obtain $C_l,C_u$ are also provided.
OCMay 6, 2015
Modeling the Impact of Communication Loss on the Power Grid under Emergency ControlMarzieh Parandehgheibi, Konstantin Turitsyn, Eytan Modiano
We study the interaction between the power grid and the communication network used for its control. We design a centralized emergency control scheme under both full and partial communication support, to improve the performance of the power grid. We use our emergency control scheme to model the impact of communication loss on the grid. We show that unlike previous models used in the literature, the loss of communication does not necessarily lead to the failure of the correspondent power nodes; i.e. the "point-wise" failure model is not appropriate. In addition, we show that the impact of communication loss is a function of several parameters such as the size and structure of the power and communication failure, as well as the operating mode of power nodes disconnected from the communication network. Our model can be used to design the dependency between the power grid and the communication network used for its control, so as to maximize the benefit in terms of intelligent control, while minimizing the risks due to loss of communication.