30.2LGMay 10Code
End-to-End Keyword Spotting on FPGA Using Graph Neural Networks with a Neuromorphic Auditory SensorWiktor Matykiewicz, Piotr Wzorek, Kamil Jeziorek et al.
With the rapid growth of mobile robotics and embedded intelligence, there is an increasing demand for efficient on-device data processing on edge platforms. A promising research direction is the use of neuromorphic sensors inspired by human sensory systems, which generate sparse, event-based data encoding changes in the environment. In this work, we present the first end-to-end FPGA implementation of a keyword spotting system that integrates a Neuromorphic Auditory Sensor (NAS) and a graph neural network (GNN) on a single FPGA device, enabling real-time processing of raw audio data. The proposed architecture eliminates conventional signal preprocessing and operates directly on event-based audio streams. Leveraging a compute-near-memory network architecture, the system achieves efficient inference with low latency and low power consumption. Experimental results demonstrate an accuracy of 87.43% after quantization on the Google Speech Commands v2 dataset processed through the neuromorphic sensor, with end-to-end latency below 35 us and average power consumption of 1.12 W. The processed datasets, software models, and hardware modules are available at https://github.com/vision-agh/NAS-GNN-KWS.
LGMay 20, 2021
Wide & Deep neural network model for patch aggregation in CNN-based prostate cancer detection systemsLourdes Duran-Lopez, Juan P. Dominguez-Morales, Daniel Gutierrez-Galan et al.
Prostate cancer (PCa) is one of the most commonly diagnosed cancer and one of the leading causes of death among men, with almost 1.41 million new cases and around 375,000 deaths in 2020. Artificial Intelligence algorithms have had a huge impact in medical image analysis, including digital histopathology, where Convolutional Neural Networks (CNNs) are used to provide a fast and accurate diagnosis, supporting experts in this task. To perform an automatic diagnosis, prostate tissue samples are first digitized into gigapixel-resolution whole-slide images. Due to the size of these images, neural networks cannot use them as input and, therefore, small subimages called patches are extracted and predicted, obtaining a patch-level classification. In this work, a novel patch aggregation method based on a custom Wide & Deep neural network model is presented, which performs a slide-level classification using the patch-level classes obtained from a CNN. The malignant tissue ratio, a 10-bin malignant probability histogram, the least squares regression line of the histogram, and the number of malignant connected components are used by the proposed model to perform the classification. An accuracy of 94.24% and a sensitivity of 98.87% were achieved, proving that the proposed system could aid pathologists by speeding up the screening process and, thus, contribute to the fight against PCa.
CVMay 17, 2019
Dynamic Vision Sensor integration on FPGA-based CNN accelerators for high-speed visual classificationAlejandro Linares-Barranco, Antonio Rios-Navarro, Ricardo Tapiador-Morales et al.
Deep-learning is a cutting edge theory that is being applied to many fields. For vision applications the Convolutional Neural Networks (CNN) are demanding significant accuracy for classification tasks. Numerous hardware accelerators have populated during the last years to improve CPU or GPU based solutions. This technology is commonly prototyped and tested over FPGAs before being considered for ASIC fabrication for mass production. The use of commercial typical cameras (30fps) limits the capabilities of these systems for high speed applications. The use of dynamic vision sensors (DVS) that emulate the behavior of a biological retina is taking an incremental importance to improve this applications due to its nature, where the information is represented by a continuous stream of spikes and the frames to be processed by the CNN are constructed collecting a fixed number of these spikes (called events). The faster an object is, the more events are produced by DVS, so the higher is the equivalent frame rate. Therefore, these DVS utilization allows to compute a frame at the maximum speed a CNN accelerator can offer. In this paper we present a VHDL/HLS description of a pipelined design for FPGA able to collect events from an Address-Event-Representation (AER) DVS retina to obtain a normalized histogram to be used by a particular CNN accelerator, called NullHop. VHDL is used to describe the circuit, and HLS for computation blocks, which are used to perform the normalization of a frame needed for the CNN. Results outperform previous implementations of frames collection and normalization using ARM processors running at 800MHz on a Zynq7100 in both latency and power consumption. A measured 67% speedup factor is presented for a Roshambo CNN real-time experiment running at 160fps peak rate.
ASApr 30, 2019
Interfacing PDM MEMS microphones with PFM spiking systems: Application for Neuromorphic Auditory SensorsAngel Jimenez-Fernandez, Daniel Gutierrez-Galan, Antonio Rios-Navarro et al.
In neuromorphic engineering, computation is commonly performed asynchronously, mimicking the way in which nervous systems process information: spike by spike. The Neuromorphic Auditory Sensor (NAS) has been implemented under this principle: applying different spike-based Signal Processing blocks. Computation in the spike domain requires the conversion of signals from analog or digital representation to the spike domain, which could present a speed constraint in many cases. This paper presents a spike-based system to convert audio information from low-power pulse density modulation (PDM) MicroElectroMechanical Systems (MEMS) microphones into rate coded spike frequencies. These spikes represent the input signal of the NAS, avoiding the analog or digital to spike conversion, and therefore improving the time response of the NAS. This conversion has been done in VHDL as an interface for PDM microphones, converting their pulses into temporal distributed spikes following a pulse frequency modulation (PFM) scheme with an accurate Inter-Spike-Interval, known as "PDM to spikes interface" (PSI). This was tested in two scenarios, first as a stand-alone circuit for its characterization, and then integrated with a NAS for verification. The PSI achieves a Total Harmonic Distortion (THD) of -39.51dB and a Signal-to-Noise Ratio (SNR) of 59.12dB, demands less than 1\% of the resources of a Spartan-6 FPGA and has a power consumption below 5mW.
CVJun 5, 2017
NullHop: A Flexible Convolutional Neural Network Accelerator Based on Sparse Representations of Feature MapsAlessandro Aimar, Hesham Mostafa, Enrico Calabrese et al.
Convolutional neural networks (CNNs) have become the dominant neural network architecture for solving many state-of-the-art (SOA) visual processing tasks. Even though Graphical Processing Units (GPUs) are most often used in training and deploying CNNs, their power efficiency is less than 10 GOp/s/W for single-frame runtime inference. We propose a flexible and efficient CNN accelerator architecture called NullHop that implements SOA CNNs useful for low-power and low-latency application scenarios. NullHop exploits the sparsity of neuron activations in CNNs to accelerate the computation and reduce memory requirements. The flexible architecture allows high utilization of available computing resources across kernel sizes ranging from 1x1 to 7x7. NullHop can process up to 128 input and 128 output feature maps per layer in a single pass. We implemented the proposed architecture on a Xilinx Zynq FPGA platform and present results showing how our implementation reduces external memory transfers and compute time in five different CNNs ranging from small ones up to the widely known large VGG16 and VGG19 CNNs. Post-synthesis simulations using Mentor Modelsim in a 28nm process with a clock frequency of 500 MHz show that the VGG19 network achieves over 450 GOp/s. By exploiting sparsity, NullHop achieves an efficiency of 368%, maintains over 98% utilization of the MAC units, and achieves a power efficiency of over 3TOp/s/W in a core area of 6.3mm$^2$. As further proof of NullHop's usability, we interfaced its FPGA implementation with a neuromorphic event camera for real time interactive demonstrations.