DCARLGMay 10, 2023

MoCA: Memory-Centric, Adaptive Execution for Multi-Tenant Deep Neural Networks

arXiv:2305.05843v136 citations
Originality Incremental advance
AI Analysis

This addresses resource management challenges for latency-critical applications in multi-tenant DNN accelerators, representing an incremental improvement over existing compute-focused solutions.

The paper tackles the problem of quality-of-service degradation in multi-tenant deep neural network execution due to resource contention, proposing MoCA, an adaptive system that dynamically manages shared memory resources to improve SLA satisfaction rates up to 3.9x, system throughput by 2.3x, and fairness by 1.3x compared to prior work.

Driven by the wide adoption of deep neural networks (DNNs) across different application domains, multi-tenancy execution, where multiple DNNs are deployed simultaneously on the same hardware, has been proposed to satisfy the latency requirements of different applications while improving the overall system utilization. However, multi-tenancy execution could lead to undesired system-level resource contention, causing quality-of-service (QoS) degradation for latency-critical applications. To address this challenge, we propose MoCA, an adaptive multi-tenancy system for DNN accelerators. Unlike existing solutions that focus on compute resource partition, MoCA dynamically manages shared memory resources of co-located applications to meet their QoS targets. Specifically, MoCA leverages the regularities in both DNN operators and accelerators to dynamically modulate memory access rates based on their latency targets and user-defined priorities so that co-located applications get the resources they demand without significantly starving their co-runners. We demonstrate that MoCA improves the satisfaction rate of the service level agreement (SLA) up to 3.9x (1.8x average), system throughput by 2.3x (1.7x average), and fairness by 1.3x (1.2x average), compared to prior work.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes