LGAINov 1, 2021

Collage: Seamless Integration of Deep Learning Backends with Automatic Placement

arXiv:2111.00655v37 citationsHas Code
Originality Incremental advance
AI Analysis

This addresses the challenge for hardware vendors, framework developers, and machine learning practitioners to quickly deploy and optimize new backends, though it is incremental as it builds on existing integration methods.

The paper tackles the problem of inefficient integration of diverse deep learning backends in frameworks, which requires manual effort and limits performance, by proposing Collage, a framework that automatically searches for optimized backend placement strategies, achieving speedups of 1.26x to 1.43x over existing frameworks on various hardware.

The strong demand for efficient and performant deployment of Deep Learning (DL) applications prompts the rapid development of a rich DL ecosystem. To keep up with this fast advancement, it is crucial for modern DL frameworks to efficiently integrate a variety of optimized tensor algebra libraries and runtimes as their backends and generate the fastest possible executable using these backends. However, current DL frameworks require significant manual effort and expertise to integrate every new backend while failing to unleash its full potential. Given the fast-evolving nature of the DL ecosystem, this manual approach often slows down continuous innovations across different layers; it prevents hardware vendors from the fast deployment of their cutting-edge libraries, DL framework developers must repeatedly adjust their hand-coded rules to accommodate new versions of libraries, and machine learning practitioners need to wait for the integration of new technologies and often encounter unsatisfactory performance. In this paper, we propose Collage, a DL framework that offers seamless integration of DL backends. Collage provides an expressive backend registration interface that allows users to precisely specify the capability of various backends. By leveraging the specifications of available backends, Collage automatically searches for an optimized backend placement strategy for a given workload and execution environment. Our evaluation shows that Collage outperforms the best existing framework for each hardware by $1.26\times$, $1.43\times$, $1.40\times$ on average on NVIDIA's RTX 2070 GPU, V100 GPU, and Intel's Xeon 8259CL CPU, respectively. Collage has been open-sourced and deployed in Apache TVM.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes