MSDCLGNAJul 26, 2020

Optimizing Block-Sparse Matrix Multiplications on CUDA with TVM

arXiv:2007.13055v11 citations
Originality Synthesis-oriented
AI Analysis

This work addresses performance optimization for sparse matrix operations in deep learning, but it is incremental as it applies existing compiler techniques to a specific problem.

The paper tackled optimizing matrix multiplications between dense and block-sparse matrices on CUDA using TVM, achieving competitive or better performance compared to state-of-the-art frameworks through automatic parameter tuning.

We implemented and optimized matrix multiplications between dense and block-sparse matrices on CUDA. We leveraged TVM, a deep learning compiler, to explore the schedule space of the operation and generate efficient CUDA code. With the automatic parameter tuning in TVM, our cross-thread reduction based implementation achieved competitive or better performance compared with other state-of-the-art frameworks.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes