Holger Karl

LG
h-index23
14papers
355citations
Novelty48%
AI Score41

14 Papers

SYMay 27, 2020
Deep Reinforcement Learning for Wireless Sensor Scheduling in Cyber-Physical Systems

Alex S. Leong, Arunselvan Ramaswamy, Daniel E. Quevedo et al.

In many Cyber-Physical Systems, we encounter the problem of remote state estimation of geographically distributed and remote physical processes. This paper studies the scheduling of sensor transmissions to estimate the states of multiple remote, dynamic processes. Information from the different sensors have to be transmitted to a central gateway over a wireless network for monitoring purposes, where typically fewer wireless channels are available than there are processes to be monitored. For effective estimation at the gateway, the sensors need to be scheduled appropriately, i.e., at each time instant one needs to decide which sensors have network access and which ones do not. To address this scheduling problem, we formulate an associated Markov decision process (MDP). This MDP is then solved using a Deep Q-Network, a recent deep reinforcement learning algorithm that is at once scalable and model-free. We compare our scheduling algorithm to popular scheduling algorithms such as round-robin and reduced-waiting-time, among others. Our algorithm is shown to significantly outperform these algorithms for many example scenarios.

MAApr 5, 2022
Multi-Agent Distributed Reinforcement Learning for Making Decentralized Offloading Decisions

Jing Tan, Ramin Khalili, Holger Karl et al.

We formulate computation offloading as a decentralized decision-making problem with autonomous agents. We design an interaction mechanism that incentivizes agents to align private and system goals by balancing between competition and cooperation. The mechanism provably has Nash equilibria with optimal resource allocation in the static case. For a dynamic environment, we propose a novel multi-agent online learning algorithm that learns with partial, delayed and noisy state information, and a reward signal that reduces information need to a great extent. Empirical results confirm that through learning, agents significantly improve both system and individual performance, e.g., 40% offloading failure rate reduction, 32% communication overhead reduction, up to 38% computation resource savings in low contention, 18% utilization increase with reduced load variation in high contention, and improvement in fairness. Results also confirm the algorithm's good convergence and generalization property in significantly different environments.

MAJul 29, 2022
Multi-Agent Reinforcement Learning for Long-Term Network Resource Allocation through Auction: a V2X Application

Jing Tan, Ramin Khalili, Holger Karl et al.

We formulate offloading of computational tasks from a dynamic group of mobile agents (e.g., cars) as decentralized decision making among autonomous agents. We design an interaction mechanism that incentivizes such agents to align private and system goals by balancing between competition and cooperation. In the static case, the mechanism provably has Nash equilibria with optimal resource allocation. In a dynamic environment, this mechanism's requirement of complete information is impossible to achieve. For such environments, we propose a novel multi-agent online learning algorithm that learns with partial, delayed and noisy state information, thus greatly reducing information need. Our algorithm is also capable of learning from long-term and sparse reward signals with varying delay. Empirical results from the simulation of a V2X application confirm that through learning, agents with the learning algorithm significantly improve both system and individual performance, reducing up to 30% of offloading failure rate, communication overhead and load variation, increasing computation resource utilization and fairness. Results also confirm the algorithm's good convergence and generalization property in different environments.

LGApr 5, 2022
Learning to Bid Long-Term: Multi-Agent Reinforcement Learning with Long-Term and Sparse Reward in Repeated Auction Games

Jing Tan, Ramin Khalili, Holger Karl

We propose a multi-agent distributed reinforcement learning algorithm that balances between potentially conflicting short-term reward and sparse, delayed long-term reward, and learns with partial information in a dynamic environment. We compare different long-term rewards to incentivize the algorithm to maximize individual payoff and overall social welfare. We test the algorithm in two simulated auction games, and demonstrate that 1) our algorithm outperforms two benchmark algorithms in a direct competition, with cost to social welfare, and 2) our algorithm's aggressive competitive behavior can be guided with the long-term reward signal to maximize both individual payoff and overall social welfare.

9.6NIMar 24
PNap: Lifecycle-aware Edge Multi-state sleep for Energy Efficient MEC

Federico Giarrè, Holger Karl

Multi-access Edge Computings (MECs) enables low-latency services by executing applications at the network edge. To fulfill low-latency requirements of mobile users, providers have to keep multiple edge servers running at multiple locations, even when, in low-load phases, their capacity is not needed. This significantly increases energy consumption. Multi-state sleep mechanisms mitigate this issue by allowing servers to enter progressively deeper sleep states, trading energy savings for longer wake-up delays. At the same time, service execution depends on non-instantaneous lifecycle operations that cannot be performed while servers are asleep, tightly coupling energy management with service continuity. This paper introduces PowerNap (PNap), a lifecycle-aware orchestration framework that jointly manages server sleep states and service lifecycle states. By leveraging traffic forecasting, PNap jointly minimizes the number of active edge servers and service disruptions. We compare PNap against baselines approaches and a state-of-the-art approach. Results validate PNap, showing how it can reduce energy consumption by up to 14.9% with respect to a state-of-the-art solution while matching its service availability results.

LGMar 13, 2024
Multi-Objective Optimization Using Adaptive Distributed Reinforcement Learning

Jing Tan, Ramin Khalili, Holger Karl

The Intelligent Transportation System (ITS) environment is known to be dynamic and distributed, where participants (vehicle users, operators, etc.) have multiple, changing and possibly conflicting objectives. Although Reinforcement Learning (RL) algorithms are commonly applied to optimize ITS applications such as resource management and offloading, most RL algorithms focus on single objectives. In many situations, converting a multi-objective problem into a single-objective one is impossible, intractable or insufficient, making such RL algorithms inapplicable. We propose a multi-objective, multi-agent reinforcement learning (MARL) algorithm with high learning efficiency and low computational requirements, which automatically triggers adaptive few-shot learning in a dynamic, distributed and noisy environment with sparse and delayed reward. We test our algorithm in an ITS environment with edge cloud computing. Empirical results show that the algorithm is quick to adapt to new environments and performs better in all individual and system metrics compared to the state-of-the-art benchmark. Our algorithm also addresses various practical concerns with its modularized and asynchronous online training method. In addition to the cloud simulation, we test our algorithm on a single-board computer and show that it can make inference in 6 milliseconds.

LGOct 14, 2024
Learning Sub-Second Routing Optimization in Computer Networks requires Packet-Level Dynamics

Andreas Boltres, Niklas Freymuth, Patrick Jahnke et al.

Finding efficient routes for data packets is an essential task in computer networking. The optimal routes depend greatly on the current network topology, state and traffic demand, and they can change within milliseconds. Reinforcement Learning can help to learn network representations that provide routing decisions for possibly novel situations. So far, this has commonly been done using fluid network models. We investigate their suitability for millisecond-scale adaptations with a range of traffic mixes and find that packet-level network models are necessary to capture true dynamics, in particular in the presence of TCP traffic. To this end, we present $\textit{PackeRL}$, the first packet-level Reinforcement Learning environment for routing in generic network topologies. Our experiments confirm that learning-based strategies that have been trained in fluid environments do not generalize well to this more realistic, but more challenging setup. Hence, we also introduce two new algorithms for learning sub-second Routing Optimization. We present $\textit{M-Slim}$, a dynamic shortest-path algorithm that excels at high traffic volumes but is computationally hard to scale to large network topologies, and $\textit{FieldLines}$, a novel next-hop policy design that re-optimizes routing for any network topology within milliseconds without requiring any re-training. Both algorithms outperform current learning-based approaches as well as commonly used static baseline protocols in scenarios with high-traffic volumes. All findings are backed by extensive experiments in realistic network conditions in our fast and versatile training and evaluation framework.

OCMay 11, 2023
Stability and Convergence of Distributed Stochastic Approximations with large Unbounded Stochastic Information Delays

Adrian Redder, Arunselvan Ramaswamy, Holger Karl

We generalize the Borkar-Meyn stability Theorem (BMT) to distributed stochastic approximations (SAs) with information delays that possess an arbitrary moment bound. To model the delays, we introduce Age of Information Processes (AoIPs): stochastic processes on the non-negative integers with a unit growth property. We show that AoIPs with an arbitrary moment bound cannot exceed any fraction of time infinitely often. In combination with a suitably chosen stepsize, this property turns out to be sufficient for the stability of distributed SAs. Compared to the BMT, our analysis requires crucial modifications and a new line of argument to handle the SA errors caused by AoI. In our analysis, we show that these SA errors satisfy a recursive inequality. To evaluate this recursion, we propose a new Gronwall-type inequality for time-varying lower limits of summations. As applications to our distributed BMT, we discuss distributed gradient-based optimization and a new approach to analyzing SAs with momentum.

OCJan 27, 2022
Distributed gradient-based optimization in the presence of dependent aperiodic communication

Adrian Redder, Arunselvan Ramaswamy, Holger Karl

Iterative distributed optimization algorithms involve multiple agents that communicate with each other, over time, in order to minimize/maximize a global objective. In the presence of unreliable communication networks, the Age-of-Information (AoI), which measures the freshness of data received, may be large and hence hinder algorithmic convergence. In this paper, we study the convergence of general distributed gradient-based optimization algorithms in the presence of communication that neither happens periodically nor at stochastically independent points in time. We show that convergence is guaranteed provided the random variables associated with the AoI processes are stochastically dominated by a random variable with finite first moment. This improves on previous requirements of boundedness of more than the first moment. We then introduce stochastically strongly connected (SSC) networks, a new stochastic form of strong connectedness for time-varying networks. We show: If for any $p \ge0$ the processes that describe the success of communication between agents in a SSC network are $α$-mixing with $n^{p-1}α(n)$ summable, then the associated AoI processes are stochastically dominated by a random variable with finite $p$-th moment. In combination with our first contribution, this implies that distributed stochastic gradient descend converges in the presence of AoI, if $α(n)$ is summable.

LGJan 3, 2022
3DPG: Distributed Deep Deterministic Policy Gradient Algorithms for Networked Multi-Agent Systems

Adrian Redder, Arunselvan Ramaswamy, Holger Karl

We present Distributed Deep Deterministic Policy Gradient (3DPG), a multi-agent actor-critic (MAAC) algorithm for Markov games. Unlike previous MAAC algorithms, 3DPG is fully distributed during both training and deployment. 3DPG agents calculate local policy gradients based on the most recently available local data (states, actions) and local policies of other agents. During training, this information is exchanged using a potentially lossy and delaying communication network. The network therefore induces Age of Information (AoI) for data and policies. We prove the asymptotic convergence of 3DPG even in the presence of potentially unbounded Age of Information (AoI). This provides an important step towards practical online and distributed multi-agent learning since 3DPG does not assume information to be available deterministically. We analyze 3DPG in the presence of policy and data transfer under mild practical assumptions. Our analysis shows that 3DPG agents converge to a local Nash equilibrium of Markov games in terms of utility functions expressed as the expected value of the agents local approximate action-value functions (Q-functions). The expectations of the local Q-functions are with respect to limiting distributions over the global state-action space shaped by the agents' accumulated local experiences. Our results also shed light on the policies obtained by general MAAC algorithms. We show through a heuristic argument and numerical experiments that 3DPG improves convergence over previous MAAC algorithms that use old actions instead of old policies during training. Further, we show that 3DPG is robust to AoI; it learns competitive policies even with large AoI and low data availability.

NIOct 4, 2021
Reinforcement Learning for Admission Control in Wireless Virtual Network Embedding

Haitham Afifi, Fabian Sauer, Holger Karl

Using Service Function Chaining (SFC) in wireless networks became popular in many domains like networking and multimedia. It relies on allocating network resources to incoming SFCs requests, via a Virtual Network Embedding (VNE) algorithm, so that it optimizes the performance of the SFC. When the load of incoming requests -- competing for the limited network resources - increases, it becomes challenging to decide which requests should be admitted and which one should be rejected. In this work, we propose a deep Reinforcement learning (RL) solution that can learn the admission policy for different dependencies, such as the service lifetime and the priority of incoming requests. We compare the deep RL solution to a first-come-first-serve baseline that admits a request whenever there are available resources. We show that deep RL outperforms the baseline and provides higher acceptance rate with low rejections even when there are enough resources.

SYMar 8, 2018
DeepCAS: A Deep Reinforcement Learning Algorithm for Control-Aware Scheduling

Burak Demirel, Arunselvan Ramaswamy, Daniel E. Quevedo et al.

We consider networked control systems consisting of multiple independent controlled subsystems, operating over a shared communication network. Such systems are ubiquitous in cyber-physical systems, Internet of Things, and large-scale industrial systems. In many large-scale settings, the size of the communication network is smaller than the size of the system. In consequence, scheduling issues arise. The main contribution of this paper is to develop a deep reinforcement learning-based \emph{control-aware} scheduling (\textsc{DeepCAS}) algorithm to tackle these issues. We use the following (optimal) design strategy: First, we synthesize an optimal controller for each subsystem; next, we design a learning algorithm that adapts to the chosen subsystems (plants) and controllers. As a consequence of this adaptation, our algorithm finds a schedule that minimizes the \emph{control loss}. We present empirical results to show that \textsc{DeepCAS} finds schedules with better performance than periodic ones.

SEMay 19, 2016
SONATA: Service Programming and Orchestration for Virtualized Software Networks

Sevil Dräxler, Manuel Peuster, Holger Karl et al.

In conventional large-scale networks, creation and management of network services are costly and complex tasks that often consume a lot of resources, including time and manpower. Network softwarization and network function virtualization have been introduced to tackle these problems. They replace the hardware-based network service components and network control mechanisms with software components running on general-purpose hardware, aiming at decreasing costs and complexity of implementing new services, maintaining the implemented services, and managing available resources in service provisioning platforms and underlying infrastructures. To experience the full potential of these approaches, innovative development support tools and service provisioning environments are needed. To answer these needs, we introduce the SONATA architecture, a service programming, orchestration, and management framework. We present a development toolchain for virtualized network services, fully integrated with a service platform and orchestration system. We motivate the modular and flexible architecture of our system and discuss its main components and features, such as function- and service-specific managers that allow fine- grained service management, slicing support to facilitate multi-tenancy, recursiveness for improved scalability, and full-featured DevOps support.

NISep 21, 2013
Anticipatory Buffer Control and Quality Selection for Wireless Video Streaming

Martin Dräxler, Johannes Blobel, Philipp Dreimann et al.

Video streaming is in high demand by mobile users, as recent studies indicate. In cellular networks, however, the unreliable wireless channel leads to two major problems. Poor channel states degrade video quality and interrupt the playback when a user cannot sufficiently fill its local playout buffer: buffer underruns occur. In contrast to that, good channel conditions cause common greedy buffering schemes to pile up very long buffers. Such over-buffering wastes expensive wireless channel capacity. To keep buffering in balance, we employ a novel approach. Assuming that we can predict data rates, we plan the quality and download time of the video segments ahead. This anticipatory scheduling avoids buffer underruns by downloading a large number of segments before a channel outage occurs, without wasting wireless capacity by excessive buffering. We formalize this approach as an optimization problem and derive practical heuristics for segmented video streaming protocols (e.g., HLS or MPEG DASH). Simulation results and testbed measurements show that our solution essentially eliminates playback interruptions without significantly decreasing video quality.