Harnessing the Power of Serverless Runtimes for Large-Scale Optimization
This work addresses the challenge of scaling optimization algorithms efficiently and cost-effectively for researchers and practitioners in machine learning and distributed computing, though it is incremental in applying existing serverless infrastructure to a new type of problem.
The authors tackled the problem of solving large-scale optimization problems using serverless runtimes, which are typically limited to stateless computations, and demonstrated that their approach achieved relative speedups up to 256 workers with efficiencies above 70% for up to 64 workers on a regularized logistic regression problem.
The event-driven and elastic nature of serverless runtimes makes them a very efficient and cost-effective alternative for scaling up computations. So far, they have mostly been used for stateless, data parallel and ephemeral computations. In this work, we propose using serverless runtimes to solve generic, large-scale optimization problems. Specifically, we build a master-worker setup using AWS Lambda as the source of our workers, implement a parallel optimization algorithm to solve a regularized logistic regression problem, and show that relative speedups up to 256 workers and efficiencies above 70% up to 64 workers can be expected. We also identify possible algorithmic and system-level bottlenecks, propose improvements, and discuss the limitations and challenges in realizing these improvements.