Dependability Evaluation of Middleware Technology for Large-scale Distributed Caching
This work addresses the reliability of distributed caching systems for service providers handling millions of concurrent clients, but it is incremental as it focuses on evaluating existing middleware platforms.
The paper tackled the problem of evaluating the dependability of middleware platforms for large-scale distributed caching systems, finding that different platforms achieve varying availability and performance trade-offs under faults, with scenarios where few faulty components cause cascading failures.
Distributed caching systems (e.g., Memcached) are widely used by service providers to satisfy accesses by millions of concurrent clients. Given their large-scale, modern distributed systems rely on a middleware layer to manage caching nodes, to make applications easier to develop, and to apply load balancing and replication strategies. In this work, we performed a dependability evaluation of three popular middleware platforms, namely Twemproxy by Twitter, Mcrouter by Facebook, and Dynomite by Netflix, to assess availability and performance under faults, including failures of Memcached nodes and congestion due to unbalanced workloads and network link bandwidth bottlenecks. We point out the different availability and performance trade-offs achieved by the three platforms, and scenarios in which few faulty components cause cascading failures of the whole distributed system.