DCApr 20

Optimizing Memory Allocation in Distributed Clusters with Predictive Modeling

Jonathan Bader, Edgar Blumenthal, Marten Eckardt, Justus Krebs, Joel Witzke, Xemena Wysokinska, Haci Ismail Aslan, Odej Kao

arXiv:2604.1804315.21 citationsh-index: 11

AI Analysis

For operators of distributed clusters, this method reduces memory waste and failures, but the improvement is incremental over existing predictive approaches.

The paper tackles memory allocation in distributed clusters, proposing a LightGBM and XGBoost ensemble with a safety factor to reduce underallocation from 4.17% to 2.89% and overallocation from 148% to 44.51% on SAP build jobs.

In modern distributed systems, efficient resource allocation is a vital aspect to maintain scalability, reduce operational costs, and ensure fast execution even across heterogeneous workloads. Predictive models for resource usage are essential tools for optimizing allocation and preventing system bottlenecks. Predictive memory allocation has asymmetric costs as a key challenge: underallocation causes failures while overallocation wastes memory. We propose a regression method based on a LightGBM and XGBoost ensemble trained to predict high conditional quantiles. To further account for the high cost of underallocations we add a multiplicative safety factor. With our method we are able to reduce the number of under-allocated jobs from 4.17% to 2.89% and average overallocation from 148% to 44.51% on a real-world dataset of build jobs provided by SAP. We further explore the pareto frontier between optimization for underallocation and for overallocation.

View on arXiv PDF

Similar