NESep 22, 2022
Optimization of FPGA-based CNN Accelerators Using MetaheuristicsSadiq M. Sait, Aiman El-Maleh, Mohammad Altakrouri et al.
In recent years, convolutional neural networks (CNNs) have demonstrated their ability to solve problems in many fields and with accuracy that was not possible before. However, this comes with extensive computational requirements, which made general CPUs unable to deliver the desired real-time performance. At the same time, FPGAs have seen a surge in interest for accelerating CNN inference. This is due to their ability to create custom designs with different levels of parallelism. Furthermore, FPGAs provide better performance per watt compared to GPUs. The current trend in FPGA-based CNN accelerators is to implement multiple convolutional layer processors (CLPs), each of which is tailored for a subset of layers. However, the growing complexity of CNN architectures makes optimizing the resources available on the target FPGA device to deliver optimal performance more challenging. In this paper, we present a CNN accelerator and an accompanying automated design methodology that employs metaheuristics for partitioning available FPGA resources to design a Multi-CLP accelerator. Specifically, the proposed design tool adopts simulated annealing (SA) and tabu search (TS) algorithms to find the number of CLPs required and their respective configurations to achieve optimal performance on a given target FPGA device. Here, the focus is on the key specifications and hardware resources, including digital signal processors, block RAMs, and off-chip memory bandwidth. Experimental results and comparisons using four well-known benchmark CNNs are presented demonstrating that the proposed acceleration framework is both encouraging and promising. The SA-/TS-based Multi-CLP achieves 1.31x - 2.37x higher throughput than the state-of-the-art Single-/Multi-CLP approaches in accelerating AlexNet, SqueezeNet 1.1, VGGNet, and GoogLeNet architectures on the Xilinx VC707 and VC709 FPGA boards.
NEMar 22, 2022
FxP-QNet: A Post-Training Quantizer for the Design of Mixed Low-Precision DNNs with Dynamic Fixed-Point RepresentationAhmad Shawahna, Sadiq M. Sait, Aiman El-Maleh et al.
Deep neural networks (DNNs) have demonstrated their effectiveness in a wide range of computer vision tasks, with the state-of-the-art results obtained through complex and deep structures that require intensive computation and memory. Now-a-days, efficient model inference is crucial for consumer applications on resource-constrained platforms. As a result, there is much interest in the research and development of dedicated deep learning (DL) hardware to improve the throughput and energy efficiency of DNNs. Low-precision representation of DNN data-structures through quantization would bring great benefits to specialized DL hardware. However, the rigorous quantization leads to a severe accuracy drop. As such, quantization opens a large hyper-parameter space at bit-precision levels, the exploration of which is a major challenge. In this paper, we propose a novel framework referred to as the Fixed-Point Quantizer of deep neural Networks (FxP-QNet) that flexibly designs a mixed low-precision DNN for integer-arithmetic-only deployment. Specifically, the FxP-QNet gradually adapts the quantization level for each data-structure of each layer based on the trade-off between the network accuracy and the low-precision requirements. Additionally, it employs post-training self-distillation and network prediction error statistics to optimize the quantization of floating-point values into fixed-point numbers. Examining FxP-QNet on state-of-the-art architectures and the benchmark ImageNet dataset, we empirically demonstrate the effectiveness of FxP-QNet in achieving the accuracy-compression trade-off without the need for training. The results show that FxP-QNet-quantized AlexNet, VGG-16, and ResNet-18 reduce the overall memory requirements of their full-precision counterparts by 7.16x, 10.36x, and 6.44x with less than 0.95%, 0.95%, and 1.99% accuracy drop, respectively.
CRNov 2, 2019
Anomaly Detection for Industrial Control Networks using Machine Learning with the help from the Inter-Arrival CurvesBasem AL-Madani, Ahmad Shawahna, Mohammad Qureshi
Industrial Control Networks (ICN) such as Supervisory Control and Data Acquisition (SCADA) systems are widely used in industries for monitoring and controlling physical processes. These industries include power generation and supply, gas and oil production and delivery, water and waste management, telecommunication and transport facilities. The integration of internet exposes these systems to cyber threats. The consequences of compromised ICN are determine for a country economic and functional sustainability. Therefore, enforcing security and ensuring correctness operation became one of the biggest concerns for Industrial Control Systems (ICS), and need to be addressed. In this paper, we propose an anomaly detection approach for ICN using the physical properties of the system. We have developed operational baseline of electricity generation process and reduced the feature set using greedy and genetic feature selection algorithms. The classification is done based on Support Vector Machine (SVM), k-Nearest Neighbor (k-NN), and C4.5 decision tree with the help from the inter-arrival curves. The results show that the proposed approach successfully detects anomalies with a high degree of accuracy. In addition, they proved that SVM and C4.5 produces accurate results even for high sensitivity attacks when they used with the inter-arrival curves. As compared to this, k-NN is unable to produce good results for low and medium sensitivity attacks test cases.
MMNov 1, 2019
JPEG Image Compression using the Discrete Cosine Transform: An Overview, Applications, and Hardware ImplementationAhmad Shawahna, Md. Enamul Haque, Alaaeldin Amin
Digital images are becoming large in size containing more information day by day to represent the as is state of the original one due to the availability of high resolution digital cameras, smartphones, and medical tests images. Therefore, we need to come up with some technique to convert these images into smaller size without loosing much information from the actual. There are both lossy and lossless image compression format available and JPEG is one of the popular lossy compression among them. In this paper, we present the architecture and implementation of JPEG compression using VHDL (VHSIC Hardware Description Language) and compare the performance with some contemporary implementation. JPEG compression takes place in five steps with color space conversion, down sampling, discrete cosine transformation (DCT), quantization, and entropy encoding. The five steps cover for the compression purpose only. Additionally, we implement the reverse order in VHDL to get the original image back. We use optimized matrix multiplication and quantization for DCT to achieve better performance. Our experimental results show that significant amount of compression ratio has been achieved with very little change in the images, which is barely noticeable to human eye.
NEJan 1, 2019
FPGA-based Accelerators of Deep Learning Networks for Learning and Classification: A ReviewAhmad Shawahna, Sadiq M. Sait, Aiman El-Maleh
Due to recent advances in digital technologies, and availability of credible data, an area of artificial intelligence, deep learning, has emerged, and has demonstrated its ability and effectiveness in solving complex learning problems not possible before. In particular, convolution neural networks (CNNs) have demonstrated their effectiveness in image detection and recognition applications. However, they require intensive CPU operations and memory bandwidth that make general CPUs fail to achieve desired performance levels. Consequently, hardware accelerators that use application specific integrated circuits (ASICs), field programmable gate arrays (FPGAs), and graphic processing units (GPUs) have been employed to improve the throughput of CNNs. More precisely, FPGAs have been recently adopted for accelerating the implementation of deep learning networks due to their ability to maximize parallelism as well as due to their energy efficiency. In this paper, we review recent existing techniques for accelerating deep learning networks on FPGAs. We highlight the key features employed by the various techniques for improving the acceleration performance. In addition, we provide recommendations for enhancing the utilization of FPGAs for CNNs acceleration. The techniques investigated in this paper represent the recent trends in FPGA-based accelerators of deep learning networks. Thus, this review is expected to direct the future advances on efficient hardware accelerators and to be useful for deep learning researchers.