Narendra Singh Dhakad

3papers

Novelty50%

AI Score42

Ranked #85,833 of 201,326 authors (top 43%)#336 in AR (top 44%)

3 Papers

20.0ETMar 27

First Demonstration of 28 nm Fabricated FeFET-Based Nonvolatile 6T SRAM

Albi Mema, Simon Thomann, Narendra Singh Dhakad et al.

With the staggering increase of edge compute applications like Internet-of-Things (IoT) and artificial intelligence (AI), the demand for fast, energy-efficient on-chip memory is growing. While the fast and mature static random-access memory (SRAM) technology is the standard choice, its volatility requires a constant supply voltage to operate and store data. Especially in edge AI and IoT devices that often idle, the leakage power consumes a significant portion of the constrained power budget. For this, emerging non-volatile memory (NVM) technologies such as Resistive RAM and ferroelectric FET (FeFET) offer zero-standby power consumption but suffer from integration and performance tradeoffs. To harness the benefits of the different technologies, hybrid architectures have been proposed, combining SRAM with NVM devices. This work proposes a hybrid non-volatile SRAM (nvSRAM) architecture based on recently demonstrated PMOS FeFETs (p-FeFETs). By replacing the two PMOS pull-up transistors with p-FeFETs, we achieve non-volatility without additional transistors. The design supports seamless power-down and restore operation, thus eliminating standby leakage. SPICE simulations in a commercial 28 nm technology show read latency comparable to conventional SRAM, and on-silicon measurements show robust restore behavior. With this, we are the first to demonstrate a fabricated 6T nvSRAM cell design. The resulting cell achieves an area footprint of 99 $Î¼m^2$. The read path remains identical to baseline SRAM, enabling high-speed operation while being non-volatile, making it ideal for IoT and edge systems.

12.1ARMay 15

ADS-IMC: Accelerating Data Sorting with In-Memory Computation

Narendra Singh Dhakad, Santosh Kumar Vishvakarma

Sorting is a fundamental operation across numerous computational domains. Traditionally, this process involves transferring data from main memory to a processing unit for sorting, followed by writing the sorted data back to memory. This conventional approach incurs substantial latency and energy overheads due to the extensive data movement between memory and processing components. To mitigate these overheads, this paper introduces novel architectures for executing sorting operations directly within the memory fabric, eliminating the need for off-chip data transfer. To our knowledge, this work represents the first exploration of in-memory sorting using 6T SRAM. The proposed architecture is designed to operate on data represented in the standard weighted binary radix format commonly used in digital systems. The proposed architecture achieves a significant 3.4x reduction in latency compared to memristor-based IMC sorting.

47.3ARMay 15

SRAM Based Digital Custom Compute Engine for Improved Area Efficiency of AI Hardware

Narendra Singh Dhakad, Santosh Kumar Vishvakarma

This paper presents a novel architecture utilizing a 10T SRAM cell for XNOR-based in-memory computing, aimed at mitigating the extensive routing challenges typically encountered in conventional in-memory computing systems. By integrating a full adder between in-memory multiplication cells, the proposed design achieves a 50% reduction in routing complexity. The architecture performs multiply-accumulate (MAC) operations using XNOR computation optimized for binary neural networks (BNNs). Additionally, a 14T-based full adder is employed to construct an N-bit ripple carry adder in the adder tree, significantly reducing the area compared to traditional 28T-based CMOS designs. The 10T SRAM XNOR computation further enhances the latency for MAC operations. The proposed approach reduces the latency and area overhead, improving the overall hardware's area efficiency by 2.67x compared to the state-of-the-art.