DC AI LG NIOct 21, 2021

Model-based Reinforcement Learning for Service Mesh Fault Resiliency in a Web Application-level

Fanfei Meng, Lalita Jagadeesan, Marina Thottan

arXiv:2110.13621v15.114 citations

Originality Synthesis-oriented

AI Analysis

This work addresses the challenge of configuring fault resilience attributes in service meshes for web applications, which is incremental as it applies an existing method to a new domain.

The paper tackles the problem of optimizing service mesh fault resiliency in microservice-based web applications by predicting the most significant fault resilience behaviors using a model-based reinforcement learning workflow, enabling efficient agent collaborations from single to multi-service management.

Microservice-based architectures enable different aspects of web applications to be created and updated independently, even after deployment. Associated technologies such as service mesh provide application-level fault resilience through attribute configurations that govern the behavior of request-response service -- and the interactions among them -- in the presence of failures. While this provides tremendous flexibility, the configured values of these attributes -- and the relationships among them -- can significantly affect the performance and fault resilience of the overall application. Furthermore, it is impossible to determine the best and worst combinations of attribute values with respect to fault resiliency via testing, due to the complexities of the underlying distributed system and the many possible attribute value combinations. In this paper, we present a model-based reinforcement learning workflow towards service mesh fault resiliency. Our approach enables the prediction of the most significant fault resilience behaviors at a web application-level, scratching from single service to aggregated multi-service management with efficient agent collaborations.

View on arXiv PDF

Similar