ARMar 30, 2023
HARFLOW3D: A Latency-Oriented 3D-CNN Accelerator Toolflow for HAR on FPGA DevicesPetros Toupas, Alexander Montgomerie-Corcoran, Christos-Savvas Bouganis et al.
For Human Action Recognition tasks (HAR), 3D Convolutional Neural Networks have proven to be highly effective, achieving state-of-the-art results. This study introduces a novel streaming architecture based toolflow for mapping such models onto FPGAs considering the model's inherent characteristics and the features of the targeted FPGA device. The HARFLOW3D toolflow takes as input a 3D CNN in ONNX format and a description of the FPGA characteristics, generating a design that minimizes the latency of the computation. The toolflow is comprised of a number of parts, including i) a 3D CNN parser, ii) a performance and resource model, iii) a scheduling algorithm for executing 3D models on the generated hardware, iv) a resource-aware optimization engine tailored for 3D models, v) an automated mapping to synthesizable code for FPGAs. The ability of the toolflow to support a broad range of models and devices is shown through a number of experiments on various 3D CNN and FPGA system pairs. Furthermore, the toolflow has produced high-performing results for 3D CNN models that have not been mapped to FPGAs before, demonstrating the potential of FPGA-based systems in this space. Overall, HARFLOW3D has demonstrated its ability to deliver competitive latency compared to a range of state-of-the-art hand-tuned approaches being able to achieve up to 5$\times$ better performance compared to some of the existing works.
CVNov 26, 2025
Continual Error Correction on Low-Resource DevicesKirill Paramonov, Mete Ozay, Aristeidis Mystakidis et al.
The proliferation of AI models in everyday devices has highlighted a critical challenge: prediction errors that degrade user experience. While existing solutions focus on error detection, they rarely provide efficient correction mechanisms, especially for resource-constrained devices. We present a novel system enabling users to correct AI misclassifications through few-shot learning, requiring minimal computational resources and storage. Our approach combines server-side foundation model training with on-device prototype-based classification, enabling efficient error correction through prototype updates rather than model retraining. The system consists of two key components: (1) a server-side pipeline that leverages knowledge distillation to transfer robust feature representations from foundation models to device-compatible architectures, and (2) a device-side mechanism that enables ultra-efficient error correction through prototype adaptation. We demonstrate our system's effectiveness on both image classification and object detection tasks, achieving over 50% error correction in one-shot scenarios on Food-101 and Flowers-102 datasets while maintaining minimal forgetting (less than 0.02%) and negligible computational overhead. Our implementation, validated through an Android demonstration app, proves the system's practicality in real-world scenarios.
ARMay 19
A Hardware-Based Multi-Stage Dynamic Power Management Architecture for Autonomous Low-Light OperationCharalampos S. Kouzinopoulos, Marcel L. Meli, Martin Schellenberg et al.
The advance of autonomous Smart Sensor Networks and embedded systems for the Internet of Things, powered by photovoltaic energy harvesting, is severely limited by energy efficiency, especially in low-light environments. While Dynamic Power Management is essential for energy conservation, conventional software-based techniques that rely on processor-managed low-power states incur a persistent quiescent current drain. This current becomes the dominant energy sink in energy-scarce conditions, limiting autonomy. The work of this paper addresses this limitation by introducing a robust, hardware-orchestrated dynamic power management architecture that improves existing configurations for battery-based sensor nodes. The proposed architecture achieves a minimal quiescent drain of 452nA, by completely power-gating the microcontroller and all non-essential peripherals, with wake-up orchestrated by an ultra-low-power PMIC, RTC and a novel latch circuit developed specifically for this work. Our evaluation demonstrates that the dynamic power management architecture is significantly more efficient than traditional software-based sleep modes.
AIOct 21, 2024
Multi-Sensor Fusion for UAV Classification Based on Feature Maps of Image and Radar DataNikos Sakellariou, Antonios Lalas, Konstantinos Votis et al.
The unique cost, flexibility, speed, and efficiency of modern UAVs make them an attractive choice in many applications in contemporary society. This, however, causes an ever-increasing number of reported malicious or accidental incidents, rendering the need for the development of UAV detection and classification mechanisms essential. We propose a methodology for developing a system that fuses already processed multi-sensor data into a new Deep Neural Network to increase its classification accuracy towards UAV detection. The DNN model fuses high-level features extracted from individual object detection and classification models associated with thermal, optronic, and radar data. Additionally, emphasis is given to the model's Convolutional Neural Network (CNN) based architecture that combines the features of the three sensor modalities by stacking the extracted image features of the thermal and optronic sensor achieving higher classification accuracy than each sensor alone.
ARMar 27, 2024
SMOF: Streaming Modern CNNs on FPGAs with Smart Off-Chip EvictionPetros Toupas, Zhewen Yu, Christos-Savvas Bouganis et al.
Convolutional Neural Networks (CNNs) have demonstrated their effectiveness in numerous vision tasks. However, their high processing requirements necessitate efficient hardware acceleration to meet the application's performance targets. In the space of FPGAs, streaming-based dataflow architectures are often adopted by users, as significant performance gains can be achieved through layer-wise pipelining and reduced off-chip memory access by retaining data on-chip. However, modern topologies, such as the UNet, YOLO, and X3D models, utilise long skip connections, requiring significant on-chip storage and thus limiting the performance achieved by such system architectures. The paper addresses the above limitation by introducing weight and activation eviction mechanisms to off-chip memory along the computational pipeline, taking into account the available compute and memory resources. The proposed mechanism is incorporated into an existing toolflow, expanding the design space by utilising off-chip memory as a buffer. This enables the mapping of such modern CNNs to devices with limited on-chip memory, under the streaming architecture design approach. SMOF has demonstrated the capacity to deliver competitive and, in some cases, state-of-the-art performance across a spectrum of computer vision tasks, achieving up to 10.65 X throughput improvement compared to previous works.
RODec 6, 2023
From Detection to Action Recognition: An Edge-Based Pipeline for Robot Human PerceptionPetros Toupas, Georgios Tsamis, Dimitrios Giakoumis et al.
Mobile service robots are proving to be increasingly effective in a range of applications, such as healthcare, monitoring Activities of Daily Living (ADL), and facilitating Ambient Assisted Living (AAL). These robots heavily rely on Human Action Recognition (HAR) to interpret human actions and intentions. However, for HAR to function effectively on service robots, it requires prior knowledge of human presence (human detection) and identification of individuals to monitor (human tracking). In this work, we propose an end-to-end pipeline that encompasses the entire process, starting from human detection and tracking, leading to action recognition. The pipeline is designed to operate in near real-time while ensuring all stages of processing are performed on the edge, reducing the need for centralised computation. To identify the most suitable models for our mobile robot, we conducted a series of experiments comparing state-of-the-art solutions based on both their detection performance and efficiency. To evaluate the effectiveness of our proposed pipeline, we proposed a dataset comprising daily household activities. By presenting our findings and analysing the results, we demonstrate the efficacy of our approach in enabling mobile robots to understand and respond to human behaviour in real-world scenarios relying mainly on the data from their RGB cameras.
ARMay 31, 2023
fpgaHART: A toolflow for throughput-oriented acceleration of 3D CNNs for HAR onto FPGAsPetros Toupas, Christos-Savvas Bouganis, Dimitrios Tzovaras
Surveillance systems, autonomous vehicles, human monitoring systems, and video retrieval are just few of the many applications in which 3D Convolutional Neural Networks are exploited. However, their extensive use is restricted by their high computational and memory requirements, especially when integrated into systems with limited resources. This study proposes a toolflow that optimises the mapping of 3D CNN models for Human Action Recognition onto FPGA devices, taking into account FPGA resources and off-chip memory characteristics. The proposed system employs Synchronous Dataflow (SDF) graphs to model the designs and introduces transformations to expand and explore the design space, resulting in high-throughput designs. A variety of 3D CNN models were evaluated using the proposed toolflow on multiple FPGA devices, demonstrating its potential to deliver competitive performance compared to earlier hand-tuned and model-specific designs.
CVMay 29, 2023
FMM-X3D: FPGA-based modeling and mapping of X3D for Human Action RecognitionPetros Toupas, Christos-Savvas Bouganis, Dimitrios Tzovaras
3D Convolutional Neural Networks are gaining increasing attention from researchers and practitioners and have found applications in many domains, such as surveillance systems, autonomous vehicles, human monitoring systems, and video retrieval. However, their widespread adoption is hindered by their high computational and memory requirements, especially when resource-constrained systems are targeted. This paper addresses the problem of mapping X3D, a state-of-the-art model in Human Action Recognition that achieves accuracy of 95.5\% in the UCF101 benchmark, onto any FPGA device. The proposed toolflow generates an optimised stream-based hardware system, taking into account the available resources and off-chip memory characteristics of the FPGA device. The generated designs push further the current performance-accuracy pareto front, and enable for the first time the targeting of such complex model architectures for the Human Action Recognition task.
CVFeb 25, 2020
A Deep Learning Framework for Simulation and Defect Prediction Applied in MicroelectronicsNikolaos Dimitriou, Lampros Leontaris, Thanasis Vafeiadis et al.
The prediction of upcoming events in industrial processes has been a long-standing research goal since it enables optimization of manufacturing parameters, planning of equipment maintenance and more importantly prediction and eventually prevention of defects. While existing approaches have accomplished substantial progress, they are mostly limited to processing of one dimensional signals or require parameter tuning to model environmental parameters. In this paper, we propose an alternative approach based on deep neural networks that simulates changes in the 3D structure of a monitored object in a batch based on previous 3D measurements. In particular, we propose an architecture based on 3D Convolutional Neural Networks (3DCNN) in order to model the geometric variations in manufacturing parameters and predict upcoming events related to sub-optimal performance. We validate our framework on a microelectronics use-case using the recently published PCB scans dataset where we simulate changes on the shape and volume of glue deposited on an Liquid Crystal Polymer (LCP) substrate before the attachment of integrated circuits (IC). Experimental evaluation examines the impact of different choices in the cost function during training and shows that the proposed method can be efficiently used for defect prediction.
CVFeb 25, 2020
Fault Diagnosis in Microelectronics Attachment via Deep Learning Analysis of 3D Laser ScansNikolaos Dimitriou, Lampros Leontaris, Thanasis Vafeiadis et al.
A common source of defects in manufacturing miniature Printed Circuits Boards (PCB) is the attachment of silicon die or other wire bondable components on a Liquid Crystal Polymer (LCP) substrate. Typically, a conductive glue is dispensed prior to attachment with defects caused either by insufficient or excessive glue. The current practice in electronics industry is to examine the deposited glue by a human operator a process that is both time consuming and inefficient especially in preproduction runs where the error rate is high. In this paper we propose a system that automates fault diagnosis by accurately estimating the volume of glue deposits before and even after die attachment. To this end a modular scanning system is deployed that produces high resolution point clouds whereas the actual estimation of glue volume is performed by (R)egression-Net (RNet), a 3D Convolutional Neural Network (3DCNN). RNet outperforms other deep architectures and is able to estimate the volume either directly from the point cloud of a glue deposit or more interestingly after die attachment when only a small part of glue is visible around each die. The entire methodology is evaluated under operational conditions where the proposed system achieves accurate results without delaying the manufacturing process.
CLOct 24, 2018
Image-based Natural Language Understanding Using 2D Convolutional Neural NetworksErinc Merdivan, Anastasios Vafeiadis, Dimitrios Kalatzis et al.
We propose a new approach to natural language understanding in which we consider the input text as an image and apply 2D Convolutional Neural Networks to learn the local and global semantics of the sentences from the variations ofthe visual patterns of words. Our approach demonstrates that it is possible to get semantically meaningful features from images with text without using optical character recognition and sequential processing pipelines, techniques that traditional Natural Language Understanding algorithms require. To validate our approach, we present results for two applications: text classification and dialog modeling. Using a 2D Convolutional Neural Network, we were able to outperform the state-of-art accuracy results of non-Latin alphabet-based text classification and achieved promising results for eight text classification datasets. Furthermore, our approach outperformed the memory networks when using out of vocabulary entities fromtask 4 of the bAbI dialog dataset.
NIJul 2, 2013
Security for Smart Mobile Networks: The NEMESYS ApproachErol Gelenbe, Gokce Gorbil, Dimitrios Tzovaras et al.
The growing popularity of smart mobile devices such as smartphones and tablets has made them an attractive target for cyber-criminals, resulting in a rapidly growing and evolving mobile threat as attackers experiment with new business models by targeting mobile users. With the emergence of the first large-scale mobile botnets, the core network has also become vulnerable to distributed denial-of-service attacks such as the signaling attack. Furthermore, complementary access methods such as Wi-Fi and femtocells introduce additional vulnerabilities for the mobile users as well as the core network. In this paper, we present the NEMESYS approach to smart mobile network security. The goal of the NEMESYS project is to develop novel security technologies for seamless service provisioning in the smart mobile ecosystem, and to improve mobile network security through a better understanding of the threat landscape. To this purpose, NEMESYS will collect and analyze information about the nature of cyber-attacks targeting smart mobile devices and the core network so that appropriate counter-measures can be taken. We are developing a data collection infrastructure that incorporates virtualized mobile honeypots and honeyclients in order to gather, detect and provide early warning of mobile attacks and understand the modus operandi of cyber-criminals that target mobile devices. By correlating the extracted information with known attack patterns from wireline networks, we plan to reveal and identify the possible shift in the way that cyber-criminals launch attacks against smart mobile devices.
NIMay 23, 2013
NEMESYS: Enhanced Network Security for Seamless Service Provisioning in the Smart Mobile EcosystemErol Gelenbe, Gokce Gorbil, Dimitrios Tzovaras et al.
As a consequence of the growing popularity of smart mobile devices, mobile malware is clearly on the rise, with attackers targeting valuable user information and exploiting vulnerabilities of the mobile ecosystems. With the emergence of large-scale mobile botnets, smartphones can also be used to launch attacks on mobile networks. The NEMESYS project will develop novel security technologies for seamless service provisioning in the smart mobile ecosystem, and improve mobile network security through better understanding of the threat landscape. NEMESYS will gather and analyze information about the nature of cyber-attacks targeting mobile users and the mobile network so that appropriate counter-measures can be taken. We will develop a data collection infrastructure that incorporates virtualized mobile honeypots and a honeyclient, to gather, detect and provide early warning of mobile attacks and better understand the modus operandi of cyber-criminals that target mobile devices. By correlating the extracted information with the known patterns of attacks from wireline networks, we will reveal and identify trends in the way that cyber-criminals launch attacks against mobile devices.