LGDCNov 2, 2020

Cortex: A Compiler for Recursive Deep Learning Models

arXiv:2011.01383v231 citations
AI Analysis

This addresses performance bottlenecks in recursive models for AI practitioners, offering a novel compiler-based solution.

The paper tackles the problem of optimizing recursive deep learning models for low latency inference by introducing Cortex, a compiler-based approach that performs end-to-end optimizations, resulting in up to 14x lower inference latencies compared to past work.

Optimizing deep learning models is generally performed in two steps: (i) high-level graph optimizations such as kernel fusion and (ii) low level kernel optimizations such as those found in vendor libraries. This approach often leaves significant performance on the table, especially for the case of recursive deep learning models. In this paper, we present Cortex, a compiler-based approach to generate highly-efficient code for recursive models for low latency inference. Our compiler approach and low reliance on vendor libraries enables us to perform end-to-end optimizations, leading to up to 14X lower inference latencies over past work, across different backends.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes