DBMar 17

MFTune: An Efficient Multi-fidelity Framework for Spark SQL Configuration Tuning

arXiv:2603.1645091.7h-index: 9
AI Analysis

This addresses the challenge of optimizing Spark SQL performance for big data analytics, which is incremental as it builds on multi-fidelity optimization but adapts it specifically for Spark SQL bottlenecks.

The paper tackled the problem of efficiently tuning Spark SQL configurations by proposing MFTune, a multi-fidelity framework that uses query-based fidelity partitioning and density-based optimization, which outperformed five state-of-the-art methods on TPC-H and TPC-DS benchmarks within practical time constraints.

Apache Spark SQL is a cornerstone of modern big data analytics.However,optimizing Spark SQL performance is challenging due to its vast configuration space and the prohibitive cost of evaluating massive workloads. Existing tuning methods predominantly rely on full-fidelity evaluations, which are extremely time-consuming,often leading to suboptimal performance within practical budgets.While multi-fidelity optimization offers a potential solution, directly applying standard techniques-such as data volume reduction or early stopping-proves ineffective for Spark SQL as they fail to preserve performance correlations or represent true system bottlenecks. To address these challenges, we propose MFTune, an efficient multi-fidelity framework that introduces a query-based fidelity partitioning strategy, utilizing representative SQL subsets to provide accurate, low-cost proxies. To navigate the huge search space, MFTune incorporates a density-based optimization mechanism for automated knob and range compression, alongside an adapted transfer learning approach and a two-phase warm start to further accelerate the tuning process. Experimental results on TPC-H and TPC-DS benchmarks demonstrate that MFTune significantly outperforms five state-of-the-art tuning methods, identifying superior configurations within practical time constraints.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes