AROct 16, 2025

Computing-In-Memory Aware Model Adaption For Edge Devices

arXiv:2510.14379h-index: 3

Originality Incremental advance

AI Analysis

For edge device practitioners using CIM accelerators, this work provides a practical method to improve resource utilization and reduce latency without sacrificing accuracy.

The paper addresses throughput and accuracy bottlenecks in CIM macros due to limited macro size and ADC precision, proposing a two-stage model adaptation process that achieves 90% CIM array utilization, 93% compression, and concurrent activation of up to 256 word lines while maintaining accuracy comparable to prior methods.

Computing-in-Memory (CIM) macros have gained popularity for deep learning acceleration due to their highly parallel computation and low power consumption. However, limited macro size and ADC precision introduce throughput and accuracy bottlenecks. This paper proposes a two-stage CIM-aware model adaptation process. The first stage compresses the model and reallocates resources based on layer importance and macro size constraints, reducing model weight loading latency while improving resource utilization and maintaining accuracy. The second stage performs quantization-aware training, incorporating partial sum quantization and ADC precision to mitigate quantization errors in inference. The proposed approach enhances CIM array utilization to 90\%, enables concurrent activation of up to 256 word lines, and achieves up to 93\% compression, all while preserving accuracy comparable to previous methods.

View on arXiv PDF

Similar