SYAILGJun 25, 2023

A Framework for dynamically meeting performance objectives on a service mesh

arXiv:2306.14178v13 citationsh-index: 28
Originality Incremental advance
AI Analysis

This work addresses performance management for services on a service mesh, but it is incremental as it builds on existing RL and simulation techniques.

The authors tackled the problem of dynamically managing multiple services on a service mesh to meet performance objectives like delay bounds and throughput, using reinforcement learning to train an agent for resource reallocation, achieving training speed-ups by orders of magnitude through simulation.

We present a framework for achieving end-to-end management objectives for multiple services that concurrently execute on a service mesh. We apply reinforcement learning (RL) techniques to train an agent that periodically performs control actions to reallocate resources. We develop and evaluate the framework using a laboratory testbed where we run information and computing services on a service mesh, supported by the Istio and Kubernetes platforms. We investigate different management objectives that include end-to-end delay bounds on service requests, throughput objectives, cost-related objectives, and service differentiation. We compute the control policies on a simulator rather than on the testbed, which speeds up the training time by orders of magnitude for the scenarios we study. Our proposed framework is novel in that it advocates a top-down approach whereby the management objectives are defined first and then mapped onto the available control actions. It allows us to execute several types of control actions simultaneously. By first learning the system model and the operating region from testbed traces, we can train the agent for different management objectives in parallel.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes