Performance Embeddings: A Similarity-based Approach to Automatic Performance Optimization
This addresses the problem of efficient performance tuning for developers in domains like deep learning and scientific computing, offering a novel approach that generalizes beyond prior specialized methods.
The paper tackles the repetitive challenge of performance optimization by constructing an embedding space for subprograms to enable knowledge transfer of tuning between applications, reducing search complexity by up to four orders of magnitude and outperforming the MKL library in sparse-dense matrix multiplication.
Performance optimization is an increasingly challenging but often repetitive task. While each platform has its quirks, the underlying code transformations rely on data movement and computational characteristics that recur across applications. This paper proposes to leverage those similarities by constructing an embedding space for subprograms. The continuous space captures both static and dynamic properties of loop nests via symbolic code analysis and performance profiling, respectively. Performance embeddings enable direct knowledge transfer of performance tuning between applications, which can result from autotuning or tailored improvements. We demonstrate this transfer tuning approach on case studies in deep neural networks, dense and sparse linear algebra compositions, and numerical weather prediction stencils. Transfer tuning reduces the search complexity by up to four orders of magnitude and outperforms the MKL library in sparse-dense matrix multiplication. The results exhibit clear correspondences between program characteristics and optimizations, outperforming prior specialized state-of-the-art approaches and generalizing beyond their capabilities.