LazyTensor: combining eager execution with domain-specific compilers
This addresses the language subset problem for ML practitioners using eager frameworks, allowing them to leverage optimized compilers while maintaining full expressivity, though it is incremental in building on existing compiler and framework concepts.
The paper tackles the problem of combining the ease of use of eager execution in ML frameworks with the performance benefits of domain-specific compilers, resulting in LazyTensor, a technique that enables this integration without sacrificing ergonomics and has been successfully applied across multiple Tensor implementations, hardware accelerators, and programming languages.
Domain-specific optimizing compilers have demonstrated significant performance and portability benefits, but require programs to be represented in their specialized IRs. Existing frontends to these compilers suffer from the "language subset problem" where some host language features are unsupported in the subset of the user's program that interacts with the domain-specific compiler. By contrast, define-by-run ML frameworks-colloquially called "eager" mode-are popular due to their ease of use and expressivity, where the full power of the host programming language can be used. LazyTensor is a technique to target domain specific compilers without sacrificing define-by-run ergonomics. Initially developed to support PyTorch on Cloud TPUs, the technique, along with a substantially shared implementation, has been used by Swift for TensorFlow across CPUs, GPUs, and TPUs, demonstrating the generality of the approach across (1) Tensor implementations, (2) hardware accelerators, and (3) programming languages.