Marisa Kirisame

LG
4papers
256citations
Novelty55%
AI Score45

4 Papers

PLSep 26, 2018Code
Relay: A New IR for Machine Learning Frameworks

Jared Roesch, Steven Lyubomirsky, Logan Weber et al.

Machine learning powers diverse services in industry including search, translation, recommendation systems, and security. The scale and importance of these models require that they be efficient, expressive, and portable across an array of heterogeneous hardware devices. These constraints are often at odds; in order to better accommodate them we propose a new high-level intermediate representation (IR) called Relay. Relay is being designed as a purely-functional, statically-typed language with the goal of balancing efficient compilation, expressiveness, and portability. We discuss the goals of Relay and highlight its important design constraints. Our prototype is part of the open source NNVM compiler framework, which powers Amazon's deep learning framework MxNet.

86.3PLMar 20
Incremental Live Programming via Shortcut Memoization

Marisa Kirisame, Thomas J. Porter, Ruqing Yang et al.

Live programming systems aim to quickly show programmers the dynamic impacts of program edits. To do so, they re-execute the program whenever it is edited, which poses a computational challenge when programs become large or complex. This has led to the need for incrementality in the implementation of live program interpreters. This paper introduces Chordata, an incremental program interpreter based on shortcut memoization, which learns repeated patterns of computation, called shortcuts, by observing executions of previous versions of a program. It can then apply these shortcuts when the same or a structurally similar program fragment is re-executed. This paper contributes a formal semantics of shortcut memoization for any language with a rewrite-based semantics, with mechanized proofs of key correctness properties. We then express a variant of the Hazel live programming system, expressed as a CEK machine, in Chordata, and develop a number of practical heuristics to learn high-value shortcuts. We evaluate the resulting system on editing traces of students solving simple programming problems. Chordata achieves a speedup of 13.03\times compared to baseline with a 19.97\times memory overhead. For smaller changes and for more complex programs, Chordata achieves even greater speedups. Furthermore, we show that Chordata is capable of providing a speedup even within a single execution, with a faster speedup on a larger input.

LGJun 17, 2020
Dynamic Tensor Rematerialization

Marisa Kirisame, Steven Lyubomirsky, Altan Haan et al.

Checkpointing enables the training of deep learning models under restricted memory budgets by freeing intermediate activations from memory and recomputing them on demand. Current checkpointing techniques statically plan these recomputations offline and assume static computation graphs. We demonstrate that a simple online algorithm can achieve comparable performance by introducing Dynamic Tensor Rematerialization (DTR), a greedy online algorithm for checkpointing that is extensible and general, is parameterized by eviction policy, and supports dynamic models. We prove that DTR can train an $N$-layer linear feedforward network on an $Ω(\sqrt{N})$ memory budget with only $\mathcal{O}(N)$ tensor operations. DTR closely matches the performance of optimal static checkpointing in simulated experiments. We incorporate a DTR prototype into PyTorch merely by interposing on tensor allocations and operator calls and collecting lightweight metadata on tensors.

LGApr 17, 2019
Relay: A High-Level Compiler for Deep Learning

Jared Roesch, Steven Lyubomirsky, Marisa Kirisame et al.

Frameworks for writing, compiling, and optimizing deep learning (DL) models have recently enabled progress in areas like computer vision and natural language processing. Extending these frameworks to accommodate the rapidly diversifying landscape of DL models and hardware platforms presents challenging tradeoffs between expressivity, composability, and portability. We present Relay, a new compiler framework for DL. Relay's functional, statically typed intermediate representation (IR) unifies and generalizes existing DL IRs to express state-of-the-art models. The introduction of Relay's expressive IR requires careful design of domain-specific optimizations, addressed via Relay's extension mechanisms. Using these extension mechanisms, Relay supports a unified compiler that can target a variety of hardware platforms. Our evaluation demonstrates Relay's competitive performance for a broad class of models and devices (CPUs, GPUs, and emerging accelerators). Relay's design demonstrates how a unified IR can provide expressivity, composability, and portability without compromising performance.