Autotuning Apache TVM-based Scientific Applications Using Bayesian Optimization
This work addresses performance tuning for scientific applications on hardware platforms, but it is incremental as it builds upon existing TVM methods.
The authors tackled the problem of optimizing dense matrix factorizations like LU and Cholesky decompositions on GPUs and AI accelerators using Apache TVM, and their proposed autotuning framework with Bayesian Optimization outperformed the existing AutoTVM framework in most cases.
Apache TVM (Tensor Virtual Machine), an open source machine learning compiler framework designed to optimize computations across various hardware platforms, provides an opportunity to improve the performance of dense matrix factorizations such as LU (Lower Upper) decomposition and Cholesky decomposition on GPUs and AI (Artificial Intelligence) accelerators. In this paper, we propose a new TVM autotuning framework using Bayesian Optimization and use the TVM tensor expression language to implement linear algebra kernels such as LU, Cholesky, and 3mm. We use these scientific computation kernels to evaluate the effectiveness of our methods on a GPU cluster, called Swing, at Argonne National Laboratory. We compare the proposed autotuning framework with the TVM autotuning framework AutoTVM with four tuners and find that our framework outperforms AutoTVM in most cases.