A simple and effective predictive resource scaling heuristic for large-scale cloud applications
This work addresses resource scaling for large-scale cloud applications, offering an incremental improvement over existing methods.
The paper tackles the problem of predictive auto-scaling for cloud applications with delayed resource addition and limited deployment throughput, proposing a policy that uses probabilistic workload forecasts and risk aversion to make decisions, and shows it performs favorably compared to sophisticated and simple benchmarks in experiments with real-world and synthetic data.
We propose a simple yet effective policy for the predictive auto-scaling of horizontally scalable applications running in cloud environments, where compute resources can only be added with a delay, and where the deployment throughput is limited. Our policy uses a probabilistic forecast of the workload to make scaling decisions dependent on the risk aversion of the application owner. We show in our experiments using real-world and synthetic data that this policy compares favorably to mathematically more sophisticated approaches as well as to simple benchmark policies.