Dengfeng Wang

2papers

2 Papers

ARJul 6, 2023
TL-nvSRAM-CIM: Ultra-High-Density Three-Level ReRAM-Assisted Computing-in-nvSRAM with DC-Power Free Restore and Ternary MAC Operations

Dengfeng Wang, Liukai Xu, Songyuan Liu et al.

Accommodating all the weights on-chip for large-scale NNs remains a great challenge for SRAM based computing-in-memory (SRAM-CIM) with limited on-chip capacity. Previous non-volatile SRAM-CIM (nvSRAM-CIM) addresses this issue by integrating high-density single-level ReRAMs on the top of high-efficiency SRAM-CIM for weight storage to eliminate the off-chip memory access. However, previous SL-nvSRAM-CIM suffers from poor scalability for an increased number of SL-ReRAMs and limited computing efficiency. To overcome these challenges, this work proposes an ultra-high-density three-level ReRAMs-assisted computing-in-nonvolatile-SRAM (TL-nvSRAM-CIM) scheme for large NN models. The clustered n-selector-n-ReRAM (cluster-nSnRs) is employed for reliable weight-restore with eliminated DC power. Furthermore, a ternary SRAM-CIM mechanism with differential computing scheme is proposed for energy-efficient ternary MAC operations while preserving high NN accuracy. The proposed TL-nvSRAM-CIM achieves 7.8x higher storage density, compared with the state-of-art works. Moreover, TL-nvSRAM-CIM shows up to 2.9x and 1.9x enhanced energy-efficiency, respectively, compared to the baseline designs of SRAM-CIM and ReRAM-CIM, respectively.

23.7ARMar 22
PC2IM: An Efficient In-Memory Computing Accelerator for 3D Point Cloud

Dengfeng Wang, Shunqin Cai, Yanan Sun

3D point cloud neural networks have significantly enhanced the perceptual capabilities of resource-limited mobile intelligent systems. However, despite the transformative impact, the point cloud algorithm suffers from substantial memory access during data preprocessing and imposes a burdensome workload on feature computing, resulting in high energy consumption and latency. In this paper, an efficient SRAM-based computing-in-memory (SRAM-CIM) accelerator (PC2IM), is proposed to alleviate memory access bottlenecks in point-based 3D point cloud networks. A data preprocessing module driven by the customized CIM engines is proposed and incorporated into a memory-efficient data flow. Specifically, an approximate distance SRAM-CIM (APD-CIM) is introduced to eliminate the repetitive on-chip memory access for point clouds that are spatially partitioned by the median and reduce the volume of temporary distance data. Building on the APD-CIM, a two-level Ping-Pong-MAX Content Addressable Memory (Ping-Pong-MAX CAM) is introduced to adaptively update temporary distances and perform in-situ search for the maximum, further reducing memory access. Additionally, an efficient CIM-based feature computing engine, named split-concatenate SRAM-CIM, is presented to minimize computation latency in multi-layer perceptron with high-precision input, while maintaining high area and energy efficiency. Experiment results show that the proposed PC2IM demonstrates 1.5x speedup and 2.7x enhanced energy efficiency compared to state-of-the-art point cloud accelerator. Moreover, PC2IM achieves 3.5x speedup and 1518.9x enhanced energy efficiency compared to GPU implementations.