Stripe: Tensor Compilation via the Nested Polyhedral Model
This addresses the challenge for ML engineers and hardware developers who face excessive effort in maintaining state-of-the-art performance with hand-tuned kernels, offering a more scalable solution.
The paper tackles the problem of generating high-performance code for rapidly evolving hardware and machine learning libraries by introducing Stripe, an intermediate representation based on the Nested Polyhedral Model, which enables a compiler that allows independent development of algorithms, optimizations, and hardware accelerators, resulting in improved design exploration over traditional methods.
Hardware architectures and machine learning (ML) libraries evolve rapidly. Traditional compilers often fail to generate high-performance code across the spectrum of new hardware offerings. To mitigate, engineers develop hand-tuned kernels for each ML library update and hardware upgrade. Unfortunately, this approach requires excessive engineering effort to scale or maintain with any degree of state-of-the-art performance. Here we present a Nested Polyhedral Model for representing highly parallelizable computations with limited dependencies between iterations. This model provides an underlying framework for an intermediate representation (IR) called Stripe, amenable to standard compiler techniques while naturally modeling key aspects of modern ML computing. Stripe represents parallelism, efficient memory layout, and multiple compute units at a level of abstraction amenable to automatic optimization. We describe how Stripe enables a compiler for ML in the style of LLVM that allows independent development of algorithms, optimizations, and hardware accelerators. We also discuss the design exploration advantages of Stripe over kernel libraries and schedule-based or schedule-space-based code generation.