ARLGMay 19, 2025

Introducing Instruction-Accurate Simulators for Performance Estimation of Autotuning Workloads

arXiv:2505.13357v1h-index: 7DAC
Originality Incremental advance
AI Analysis

This work addresses the scalability and hardware availability issues in autotuning for machine learning workloads, offering a practical solution for embedded and diverse architectures, though it is incremental as it builds on existing autotuning and simulation methods.

The paper tackles the problem of accelerating machine learning workloads by enabling autotuning on simulators instead of target hardware, achieving results where the best workload implementation is always within the top 3% of predictions across x86, ARM, and RISC-V architectures, and outperforming native execution in some cases with minimal parallel samples.

Accelerating Machine Learning (ML) workloads requires efficient methods due to their large optimization space. Autotuning has emerged as an effective approach for systematically evaluating variations of implementations. Traditionally, autotuning requires the workloads to be executed on the target hardware (HW). We present an interface that allows executing autotuning workloads on simulators. This approach offers high scalability when the availability of the target HW is limited, as many simulations can be run in parallel on any accessible HW. Additionally, we evaluate the feasibility of using fast instruction-accurate simulators for autotuning. We train various predictors to forecast the performance of ML workload implementations on the target HW based on simulation statistics. Our results demonstrate that the tuned predictors are highly effective. The best workload implementation in terms of actual run time on the target HW is always within the top 3 % of predictions for the tested x86, ARM, and RISC-V-based architectures. In the best case, this approach outperforms native execution on the target HW for embedded architectures when running as few as three samples on three simulators in parallel.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes