High-Dimensional Bayesian Optimization with Multi-Task Learning for RocksDB
This work addresses the challenge of high-dimensional tuning for RocksDB, a widely used key-value store, offering incremental improvements in optimization efficiency for database performance.
The paper tackled the problem of auto-tuning RocksDB's complex configurations to maximize IO throughput by proposing multi-task modeling and manual parameter grouping, achieving a 1.3x improvement in throughput and converging in 10 steps compared to 50 for other methods.
RocksDB is a general-purpose embedded key-value store used in multiple different settings. Its versatility comes at the cost of complex tuning configurations. This paper investigates maximizing the throughput of RocksDB IO operations by auto-tuning ten parameters of varying ranges. Off-the-shelf optimizers struggle with high-dimensional problem spaces and require a large number of training samples. We propose two techniques to tackle this problem: multi-task modeling and dimensionality reduction through a manual grouping of parameters. By incorporating adjacent optimization in the model, the model converged faster and found complicated settings that other tuners could not find. This approach had an additional computational complexity overhead, which we mitigated by manually assigning parameters to each sub-goal through our knowledge of RocksDB. The model is then incorporated in a standard Bayesian Optimization loop to find parameters that maximize RocksDB's IO throughput. Our method achieved x1.3 improvement when benchmarked against a simulation of Facebook's social graph traffic, and converged in ten optimization steps compared to other state-of-the-art methods that required fifty steps.