ET AIAug 9, 2024

A Collaborative PIM Computing Optimization Framework for Multi-Tenant DNN

Bojing Li, Duo Zhong, Xiang Chen, Chenchen Liu

arXiv:2408.04812v11.22 citationsh-index: 12

Originality Incremental advance

AI Analysis

This work addresses resource contention and latency issues in multi-tenant DNN deployments on PIM hardware, offering incremental optimizations for AI applications in computing systems.

The paper tackles the challenge of efficiently deploying multi-tenant deep neural networks on ReRAM-based processing-in-memory designs by proposing a novel framework that partitions hardware at the tenant level and reconstructs processing pipelines at the operator level, achieving speed improvements ranging from 1.75x to 60.43x and energy improvements up to 1.89x compared to direct deployments.

Modern Artificial Intelligence (AI) applications are increasingly utilizing multi-tenant deep neural networks (DNNs), which lead to a significant rise in computing complexity and the need for computing parallelism. ReRAM-based processing-in-memory (PIM) computing, with its high density and low power consumption characteristics, holds promising potential for supporting the deployment of multi-tenant DNNs. However, direct deployment of complex multi-tenant DNNs on exsiting ReRAM-based PIM designs poses challenges. Resource contention among different tenants can result in sever under-utilization of on-chip computing resources. Moreover, area-intensive operators and computation-intensive operators require excessively large on-chip areas and long processing times, leading to high overall latency during parallel computing. To address these challenges, we propose a novel ReRAM-based in-memory computing framework that enables efficient deployment of multi-tenant DNNs on ReRAM-based PIM designs. Our approach tackles the resource contention problems by iteratively partitioning the PIM hardware at tenant level. In addition, we construct a fine-grained reconstructed processing pipeline at the operator level to handle area-intensive operators. Compared to the direct deployments on traditional ReRAM-based PIM designs, our proposed PIM computing framework achieves significant improvements in speed (ranges from 1.75x to 60.43x) and energy(up to 1.89x).

View on arXiv PDF

Similar