DC AI LGJan 9, 2025

Tempo: Compiled Dynamic Deep Learning with Symbolic Dependence Graphs

arXiv:2501.05408v32.31 citationsh-index: 1SOSP

Originality Highly original

AI Analysis

This addresses a key bottleneck in deep learning systems for developers and researchers working with dynamic models like recurrent networks or reinforcement learning, offering a novel solution that bridges the gap between flexibility and performance.

The paper tackles the challenge of expressing and optimizing dynamic dependencies and tensor shapes in deep learning, which are difficult for existing eager or graph-based systems, by introducing Tempo, a system that combines eager execution's dynamism with graph-based compilation's optimizations, achieving up to 54x speedup and 16x lower memory usage in reinforcement learning.

Deep learning (DL) algorithms are often defined in terms of temporal relationships: a tensor at one timestep may depend on tensors from earlier or later timesteps. Such dynamic dependencies (and corresponding dynamic tensor shapes) are difficult to express and optimize: while eager DL systems support such dynamism, they cannot apply compiler-based optimizations; graph-based systems require static tensor shapes, which forces users to pad tensors or break-up programs into multiple static graphs. We describe Tempo, a new DL system that combines the dynamism of eager execution with the whole-program optimizations of graph-based compilation. Tempo achieves this through a declarative programming model with recurrent tensors, which include explicit temporal dimensions. Temporal dimensions can be indexed using symbolic expressions to express dynamic dependencies on past and future tensors. Based on this, Tempo constructs a symbolic dependence graph, which concisely encodes dynamic dependencies between operators, and applies whole-program optimizations, such as algebraic simplifications, vectorization, tiling, and fusion. By tiling dynamic dependencies into static-size blocks, Tempo can also reuse existing static code-generators. It then uses a polyhedral model to find a feasible execution schedule, which includes memory management operations. We show that Tempo achieves a 7$\times$ speedup over JAX for Llama-3.2-3B decoding; for reinforcement learning algorithms, Tempo achieves a 54$\times$ speedup, with 16$\times$ lower peak memory usage.

View on arXiv PDF

Similar