LGJan 30, 2023

Operator Fusion in XLA: Analysis and Evaluation

arXiv:2301.13062v111 citationsh-index: 9
Originality Synthesis-oriented
AI Analysis

This provides insights for ML compiler developers and users on optimizing tensor programs, though it is incremental as it analyzes existing XLA methods.

The paper tackles the lack of understanding about how XLA applies kernel fusion optimization in machine learning compilers, finding that implementing specific XLA kernel fusion strategies can achieve up to 10.56x speedup compared to a baseline in a reinforcement learning environment.

Machine learning (ML) compilers are an active area of research because they offer the potential to automatically speedup tensor programs. Kernel fusion is often cited as an important optimization performed by ML compilers. However, there exists a knowledge gap about how XLA, the most common ML compiler, applies this nuanced optimization, what kind of speedup it can afford, and what low-level effects it has on hardware. Our paper aims to bridge this knowledge gap by studying key compiler passes of XLA's source code. Our evaluation on a reinforcement learning environment Cartpole shows how different fusion decisions in XLA are made in practice. Furthermore, we implement several XLA kernel fusion strategies that can achieve up to 10.56x speedup compared to our baseline implementation.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes