Sangeet Saha

AR
3papers
20citations
Novelty28%
AI Score20

3 Papers

ARJul 2, 2024
Fast, Scalable, Energy-Efficient Non-element-wise Matrix Multiplication on FPGA

Xuqi Zhu, Huaizhi Zhang, JunKyu Lee et al.

Modern Neural Network (NN) architectures heavily rely on vast numbers of multiply-accumulate arithmetic operations, constituting the predominant computational cost. Therefore, this paper proposes a high-throughput, scalable and energy efficient non-element-wise matrix multiplication unit on FPGAs as a basic component of the NNs. We firstly streamline inter-layer and intra-layer redundancies of MADDNESS algorithm, a LUT-based approximate matrix multiplication, to design a fast, efficient scalable approximate matrix multiplication module termed "Approximate Multiplication Unit (AMU)". The AMU optimizes LUT-based matrix multiplications further through dedicated memory management and access design, decoupling computational overhead from input resolution and boosting FPGA-based NN accelerator efficiency significantly. The experimental results show that using our AMU achieves up to 9x higher throughput and 112x higher energy efficiency over the state-of-the-art solutions for the FPGA-based Quantised Neural Network (QNN) accelerators.

CVJul 5, 2018
MAT-CNN-SOPC: Motionless Analysis of Traffic Using Convolutional Neural Networks on System-On-a-Programmable-Chip

Somdip Dey, Grigorios Kalliatakis, Sangeet Saha et al.

Intelligent Transportation Systems (ITS) have become an important pillar in modern "smart city" framework which demands intelligent involvement of machines. Traffic load recognition can be categorized as an important and challenging issue for such systems. Recently, Convolutional Neural Network (CNN) models have drawn considerable amount of interest in many areas such as weather classification, human rights violation detection through images, due to its accurate prediction capabilities. This work tackles real-life traffic load recognition problem on System-On-a-Programmable-Chip (SOPC) platform and coin it as MAT-CNN- SOPC, which uses an intelligent re-training mechanism of the CNN with known environments. The proposed methodology is capable of enhancing the efficacy of the approach by 2.44x in comparison to the state-of-art and proven through experimental analysis. We have also introduced a mathematical equation, which is capable of quantifying the suitability of using different CNN models over the other for a particular application based implementation.

ARJan 15, 2014
Performance Evaluation of ECC in Single and Multi Processor Architectures on FPGA Based Embedded System

Sruti Agarwal, Sangeet Saha, Rourab Paul et al.

Cryptographic algorithms are computationally costly and the challenge is more if we need to execute them in resource constrained embedded systems. Field Programmable Gate Arrays (FPGAs) having programmable logic de- vices and processing cores, have proven to be highly feasible implementation platforms for embedded systems providing lesser design time and reconfig- urability. Design parameters like throughput, resource utilization and power requirements are the key issues. The popular Elliptic Curve Cryptography (ECC), which is superior over other public-key crypto-systems like RSA in many ways, such as providing greater security for a smaller key size, is cho- sen in this work and the possibilities of its implementation in FPGA based embedded systems for both single and dual processor core architectures in- volving task parallelization have been explored. This exploration, which is first of its kind considering the other existing works, is a needed activity for evaluating the best possible architectural environment for ECC implementa- tion on FPGA (Virtex4 XC4VFX12, FF668, -10) based embedded platform.