ARAIFeb 10, 2025

Low-Rank Compression for IMC Arrays

arXiv:2502.07820v12 citationsh-index: 5DATE
Originality Highly original
AI Analysis

This work is significant for researchers and developers working on in-memory computing architectures, as it provides an efficient solution for model compression, which is an incremental improvement over existing pruning techniques.

The authors tackled the challenge of model compression in in-memory computing architectures, achieving up to 2.5x speedup or +20.9% accuracy boost over existing pruning techniques. They addressed the issues of suboptimal array utilization and compromised accuracy in low-rank compression.

In this study, we address the challenge of low-rank model compression in the context of in-memory computing (IMC) architectures. Traditional pruning approaches, while effective in model size reduction, necessitate additional peripheral circuitry to manage complex dataflows and mitigate dislocation issues, leading to increased area and energy overheads. To circumvent these drawbacks, we propose leveraging low-rank compression techniques, which, unlike pruning, streamline the dataflow and seamlessly integrate with IMC architectures. However, low-rank compression presents its own set of challenges, namely i) suboptimal IMC array utilization and ii) compromised accuracy. To address these issues, we introduce a novel approach i) employing shift and duplicate kernel (SDK) mapping technique, which exploits idle IMC columns for parallel processing, and ii) group low-rank convolution, which mitigates the information imbalance in the decomposed matrices. Our experimental results demonstrate that our proposed method achieves up to 2.5x speedup or +20.9% accuracy boost over existing pruning techniques.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes