39.9NEMay 18Code
Spiker-LL: An Energy-Efficient FPGA Accelerator Enabling Adaptive Local Learning in Spiking Neural NetworksAlessio Caviglia, Filippo Marostica, Alessandro Savino et al.
Deploying adaptive intelligence at the edge remains challenging due to the high computational and energy cost of training neural models. Spiking Neural Networks (SNNs) offer a promising alternative, but enabling on-device learning requires hardware-algorithm co-design. This paper presents SPIKER-LL, an FPGA-based SNN accelerator that extends the open-source Spiker+ inference architecture with efficient support for the STSF local learning rule. Through targeted microarchitectural extensions, SPIKER-LL performs inference and online learning with minimal overhead. Across MNIST, F-MNIST, and DIGITS, it achieves up to 93% accuracy, sub-millisecond latency, and less than 0.1 mJ per inference, while remaining DSP-free and highly scalable for edge-FPGA deployments.
64.8NEMay 14Code
NeuroTrain: Surveying Local Learning Rules for Spiking Neural Networks with an Open Benchmarking FrameworkAlessio Caviglia, Filippo Marostica, Roberta Bardini et al.
The rapid expansion of spiking neural networks (SNNs) has led to a proliferation of training algorithms that differ widely in biological inspiration, computational structure, and hardware suitability. Despite this progress, the field lacks a unified, fine-grained taxonomy that systematically organizes these approaches and clarifies their conceptual relationships. This survey provides a comprehensive taxonomy of SNN training algorithms, spanning surrogate-gradient backpropagation, local and three-factor learning rules, biologically inspired plasticity mechanisms, ANN-to-SNN conversion pipelines, and non-standard optimization strategies. We analyze each class in terms of its computational principles, learning signals, and locality properties. To support reproducible research, we release NeuroTrain, an open-source snnTorch-based framework that implements a representative set of these algorithms within a unified, modular, and extendable framework, enabling consistent benchmarking across datasets, architectures, and training regimes. By consolidating fragmented literature and providing a reusable benchmarking framework, this survey identifies common patterns, highlights open challenges, and outlines promising directions for future work on scalable, efficient SNN training.
44.7ROMay 15
GAP: Geometric Anchor Pre-training for Data-Efficient Visuomotor Learning of Manipulation TasksDavide Buoso, Andrea Protopapa, Stefano Di Carlo et al.
Learning visuomotor policies from scarce expert demonstrations remains a core challenge in robotic manipulation. A primary hurdle lies in distilling high-dimensional RGB representations into control-relevant geometry without overfitting. While using frozen pre-trained Vision Foundation Models (VFMs) improves data efficiency, it also shifts most task adaptation onto a small spatial pooling module, which can latch onto task-irrelevant shortcuts and lose geometric grounding when finetuned with few data samples. More broadly, pre-trained visual representations used for policy learning have been observed to struggle under even minor scene perturbations, highlighting the need for robustness-oriented inductive biases. We propose Geometric Anchor Pre-training (GAP), a simple, action-free warm-up stage that regularizes the spatial adapter before downstream imitation learning. GAP pre-trains the pooling layer on a lightweight simulated proxy task where object masks are available at no cost, encouraging the adapter to produce keypoints that lie on the object, cover its spatial extent, and remain sharp and repeatable over time. This yields stable geometric anchors that provide a reliable coordinate interface for few-shot policy learning, while keeping the VFM frozen. We evaluate GAP on RoboMimic and ManiSkill under severe data scarcity (15-50 demonstrations) and domain shift. A simple adapter regularized with GAP consistently outperforms stronger attention-based poolers and end-to-end fine-tuning, achieving 62% success on RoboMimic Can with 15 demonstrations (+16% over AFA), 63% on the long-horizon high-precision Tool Hang task with 50 demonstrations, and 61% on ManiSkill StackCube with 30 demonstrations (+11% over full fine-tuning). The proxy stage is lightweight and fully decoupled from downstream tasks, making it practical to reuse across environments and manipulation skills.
SIJul 10, 2024
Can social media shape the security of next-generation connected vehicles?Nicola Scarano, Luca Mannella, Alessandro Savino et al.
The increasing adoption of connectivity and electronic components in vehicles makes these systems valuable targets for attackers. While automotive vendors prioritize safety, there remains a critical need for comprehensive assessment and analysis of cyber risks. In this context, this paper proposes a Social Media Automotive Threat Intelligence (SOCMATI) framework, specifically designed for the emerging field of automotive cybersecurity. The framework leverages advanced intelligence techniques and machine learning models to extract valuable insights from social media. Four use cases illustrate the framework's potential by demonstrating how it can significantly enhance threat assessment procedures within the automotive industry.
3.8OSMar 26
Experimental Analysis of FreeRTOS Dependability through Targeted Fault Injection CampaignsLuca Mannella, Stefano Di Carlo, Alessandro Savino
Real-Time Operating Systems (RTOSes) play a crucial role in safety-critical domains, where deterministic and predictable task execution is essential. Yet they are increasingly exposed to ionizing radiation, which can compromise system dependability. To assess FreeRTOS under such conditions, we introduce KRONOS, a software-based, non-intrusive post-propagation Fault Injection (FI) framework that injects transient and permanent faults into Operating System-visible kernel data structures without specialized hardware or debug interfaces. Using KRONOS, we conduct an extensive FI campaign on core FreeRTOS kernel components, including scheduler-related variables and Task Control Blocks (TCBs), characterizing the impact of kernel-level corruptions on functional correctness, timing behavior, and availability. The results show that corruption of pointer and key scheduler-related variables frequently leads to crashes, whereas many TCB fields have only a limited impact on system availability.
LGDec 24, 2025
LuxIA: A Lightweight Unitary matriX-based Framework Built on an Iterative Algorithm for Photonic Neural Network TrainingTzamn Melendez Carmona, Federico Marchesin, Marco P. Abrate et al.
PNNs present promising opportunities for accelerating machine learning by leveraging the unique benefits of photonic circuits. However, current state of the art PNN simulation tools face significant scalability challenges when training large-scale PNNs, due to the computational demands of transfer matrix calculations, resulting in high memory and time consumption. To overcome these limitations, we introduce the Slicing method, an efficient transfer matrix computation approach compatible with back-propagation. We integrate this method into LuxIA, a unified simulation and training framework. The Slicing method substantially reduces memory usage and execution time, enabling scalable simulation and training of large PNNs. Experimental evaluations across various photonic architectures and standard datasets, including MNIST, Digits, and Olivetti Faces, show that LuxIA consistently surpasses existing tools in speed and scalability. Our results advance the state of the art in PNN simulation, making it feasible to explore and optimize larger, more complex architectures. By addressing key computational bottlenecks, LuxIA facilitates broader adoption and accelerates innovation in AI hardware through photonic technologies. This work paves the way for more efficient and scalable photonic neural network research and development.
NEJul 4, 2025Code
SFATTI: Spiking FPGA Accelerator for Temporal Task-driven Inference -- A Case Study on MNISTAlessio Caviglia, Filippo Marostica, Alessio Carpegna et al.
Hardware accelerators are essential for achieving low-latency, energy-efficient inference in edge applications like image recognition. Spiking Neural Networks (SNNs) are particularly promising due to their event-driven and temporally sparse nature, making them well-suited for low-power Field Programmable Gate Array (FPGA)-based deployment. This paper explores using the open-source Spiker+ framework to generate optimized SNNs accelerators for handwritten digit recognition on the MNIST dataset. Spiker+ enables high-level specification of network topologies, neuron models, and quantization, automatically generating deployable HDL. We evaluate multiple configurations and analyze trade-offs relevant to edge computing constraints.
45.0NEMay 4
Elastic Spiking Transformers for Efficient Gesture UnderstandingAlberto Ancilotto, Gianluca Amprimo, Stefano Di Carlo et al.
Spiking Neural Networks (SNNs), particularly Spiking Transformers, offer energy-efficient processing of event-based sensor data for healthcare applications. Yet current architectures are rigid: they are trained and deployed as static networks with fixed parameter counts and computational graphs. This limits deployment on neuromorphic hardware such as Loihi and SpiNNaker, where on-chip constraints often require smaller models that trade accuracy for feasibility. We introduce the Elastic Spiking Transformer, a runtime-adaptive architecture that brings elasticity into the spiking paradigm. Inspired by Matryoshka-style representation learning, it embeds nested elasticity in the Feature Extractor, Spiking Self-Attention, and Feed-Forward blocks. Through granularity-aware weight sharing, a single universal model can dynamically slice network width and attention heads at inference time without retraining. This design provides two key advantages for SNNs. First, it allows the model to adjust its parameter footprint to different hardware memory budgets. Second, reducing active neurons also lowers spike firing rates, yielding proportional reductions in synaptic operations, an energy benefit not directly available in standard artificial neural networks. We evaluate the approach on CIFAR10/100, CIFAR10-DVS, and the EHWGesture clinical gesture understanding dataset. Results show that one Elastic Spiking Transformer spans a broad range of complexity-accuracy trade-offs, matching or surpassing independently trained baselines while supporting adaptive, real-time gesture recognition on resource-constrained edge devices.
NEJan 2, 2024
Spiker+: a framework for the generation of efficient Spiking Neural Networks FPGA accelerators for inference at the edgeAlessio Carpegna, Alessandro Savino, Stefano Di Carlo
Including Artificial Neural Networks in embedded systems at the edge allows applications to exploit Artificial Intelligence capabilities directly within devices operating at the network periphery. This paper introduces Spiker+, a comprehensive framework for generating efficient, low-power, and low-area customized Spiking Neural Networks (SNN) accelerators on FPGA for inference at the edge. Spiker+ presents a configurable multi-layer hardware SNN, a library of highly efficient neuron architectures, and a design framework, enabling the development of complex neural network accelerators with few lines of Python code. Spiker+ is tested on two benchmark datasets, the MNIST and the Spiking Heidelberg Digits (SHD). On the MNIST, it demonstrates competitive performance compared to state-of-the-art SNN accelerators. It outperforms them in terms of resource allocation, with a requirement of 7,612 logic cells and 18 Block RAMs (BRAMs), which makes it fit in very small FPGA, and power consumption, draining only 180mW for a complete inference on an input image. The latency is comparable to the ones observed in the state-of-the-art, with 780us/img. To the authors' knowledge, Spiker+ is the first SNN accelerator tested on the SHD. In this case, the accelerator requires 18,268 logic cells and 51 BRAM, with an overall power consumption of 430mW and a latency of 54 us for a complete inference on input data. This underscores the significance of Spiker+ in the hardware-accelerated SNN landscape, making it an excellent solution to deploy configurable and tunable SNN architectures in resource and power-constrained edge applications.
NEMar 30, 2024
SpikingJET: Enhancing Fault Injection for Fully and Convolutional Spiking Neural NetworksAnil Bayram Gogebakan, Enrico Magliano, Alessio Carpegna et al.
As artificial neural networks become increasingly integrated into safety-critical systems such as autonomous vehicles, devices for medical diagnosis, and industrial automation, ensuring their reliability in the face of random hardware faults becomes paramount. This paper introduces SpikingJET, a novel fault injector designed specifically for fully connected and convolutional Spiking Neural Networks (SNNs). Our work underscores the critical need to evaluate the resilience of SNNs to hardware faults, considering their growing prominence in real-world applications. SpikingJET provides a comprehensive platform for assessing the resilience of SNNs by inducing errors and injecting faults into critical components such as synaptic weights, neuron model parameters, internal states, and activation functions. This paper demonstrates the effectiveness of Spiking-JET through extensive software-level experiments on various SNN architectures, revealing insights into their vulnerability and resilience to hardware faults. Moreover, highlighting the importance of fault resilience in SNNs contributes to the ongoing effort to enhance the reliability and safety of Neural Network (NN)-powered systems in diverse domains.
NEApr 4, 2024
SpikeExplorer: hardware-oriented Design Space Exploration for Spiking Neural Networks on FPGADario Padovano, Alessio Carpegna, Alessandro Savino et al.
One of today's main concerns is to bring Artificial Intelligence power to embedded systems for edge applications. The hardware resources and power consumption required by state-of-the-art models are incompatible with the constrained environments observed in edge systems, such as IoT nodes and wearable devices. Spiking Neural Networks (SNNs) can represent a solution in this sense: inspired by neuroscience, they reach unparalleled power and resource efficiency when run on dedicated hardware accelerators. However, when designing such accelerators, the amount of choices that can be taken is huge. This paper presents SpikExplorer, a modular and flexible Python tool for hardware-oriented Automatic Design Space Exploration to automate the configuration of FPGA accelerators for SNNs. Using Bayesian optimizations, SpikerExplorer enables hardware-centric multi-objective optimization, supporting factors such as accuracy, area, latency, power, and various combinations during the exploration process. The tool searches the optimal network architecture, neuron model, and internal and training parameters, trying to reach the desired constraints imposed by the user. It allows for a straightforward network configuration, providing the full set of explored points for the user to pick the trade-off that best fits the needs. The potential of SpikExplorer is showcased using three benchmark datasets. It reaches 95.8% accuracy on the MNIST dataset, with a power consumption of 180mW/image and a latency of 0.12 ms/image, making it a powerful tool for automatically optimizing SNNs.
NEMay 8, 2024
Compressed Latent Replays for Lightweight Continual Learning on Spiking Neural NetworksAlberto Dequino, Alessio Carpegna, Davide Nadalini et al.
Rehearsal-based Continual Learning (CL) has been intensely investigated in Deep Neural Networks (DNNs). However, its application in Spiking Neural Networks (SNNs) has not been explored in depth. In this paper we introduce the first memory-efficient implementation of Latent Replay (LR)-based CL for SNNs, designed to seamlessly integrate with resource-constrained devices. LRs combine new samples with latent representations of previously learned data, to mitigate forgetting. Experiments on the Heidelberg SHD dataset with Sample and Class-Incremental tasks reach a Top-1 accuracy of 92.5% and 92%, respectively, without forgetting the previously learned information. Furthermore, we minimize the LRs' requirements by applying a time-domain compression, reducing by two orders of magnitude their memory requirement, with respect to a naive rehearsal setup, with a maximum accuracy drop of 4%. On a Multi-Class-Incremental task, our SNN learns 10 new classes from an initial set of 10, reaching a Top-1 accuracy of 78.4% on the full test set.
ARDec 29, 2023
Design Space Exploration of Approximate Computing Techniques with a Reinforcement Learning ApproachSepide Saeedi, Alessandro Savino, Stefano Di Carlo
Approximate Computing (AxC) techniques have become increasingly popular in trading off accuracy for performance gains in various applications. Selecting the best AxC techniques for a given application is challenging. Among proposed approaches for exploring the design space, Machine Learning approaches such as Reinforcement Learning (RL) show promising results. In this paper, we proposed an RL-based multi-objective Design Space Exploration strategy to find the approximate versions of the application that balance accuracy degradation and power and computation time reduction. Our experimental results show a good trade-off between accuracy degradation and decreased power and computation time for some benchmarks.
CVSep 9, 2025
EHWGesture -- A dataset for multimodal understanding of clinical gesturesGianluca Amprimo, Alberto Ancilotto, Alessandro Savino et al.
Hand gesture understanding is essential for several applications in human-computer interaction, including automatic clinical assessment of hand dexterity. While deep learning has advanced static gesture recognition, dynamic gesture understanding remains challenging due to complex spatiotemporal variations. Moreover, existing datasets often lack multimodal and multi-view diversity, precise ground-truth tracking, and an action quality component embedded within gestures. This paper introduces EHWGesture, a multimodal video dataset for gesture understanding featuring five clinically relevant gestures. It includes over 1,100 recordings (6 hours), captured from 25 healthy subjects using two high-resolution RGB-Depth cameras and an event camera. A motion capture system provides precise ground-truth hand landmark tracking, and all devices are spatially calibrated and synchronized to ensure cross-modal alignment. Moreover, to embed an action quality task within gesture understanding, collected recordings are organized in classes of execution speed that mirror clinical evaluations of hand dexterity. Baseline experiments highlight the dataset's potential for gesture classification, gesture trigger detection, and action quality assessment. Thus, EHWGesture can serve as a comprehensive benchmark for advancing multimodal clinical gesture understanding.
NEJun 16, 2025
Energy-Efficient Digital Design: A Comparative Study of Event-Driven and Clock-Driven Spiking NeuronsFilippo Marostica, Alessio Carpegna, Alessandro Savino et al.
This paper presents a comprehensive evaluation of Spiking Neural Network (SNN) neuron models for hardware acceleration by comparing event driven and clock-driven implementations. We begin our investigation in software, rapidly prototyping and testing various SNN models based on different variants of the Leaky Integrate and Fire (LIF) neuron across multiple datasets. This phase enables controlled performance assessment and informs design refinement. Our subsequent hardware phase, implemented on FPGA, validates the simulation findings and offers practical insights into design trade offs. In particular, we examine how variations in input stimuli influence key performance metrics such as latency, power consumption, energy efficiency, and resource utilization. These results yield valuable guidelines for constructing energy efficient, real time neuromorphic systems. Overall, our work bridges software simulation and hardware realization, advancing the development of next generation SNN accelerators.
CRJun 11, 2024
CARACAS: vehiCular ArchitectuRe for detAiled Can Attacks SimulationSadek Misto Kirdi, Nicola Scarano, Franco Oberti et al.
Modern vehicles are increasingly vulnerable to attacks that exploit network infrastructures, particularly the Controller Area Network (CAN) networks. To effectively counter such threats using contemporary tools like Intrusion Detection Systems (IDSs) based on data analysis and classification, large datasets of CAN messages become imperative. This paper delves into the feasibility of generating synthetic datasets by harnessing the modeling capabilities of simulation frameworks such as Simulink coupled with a robust representation of attack models to present CARACAS, a vehicular model, including component control via CAN messages and attack injection capabilities. CARACAS showcases the efficacy of this methodology, including a Battery Electric Vehicle (BEV) model, and focuses on attacks targeting torque control in two distinct scenarios.
LGJun 6, 2024
R-CONV: An Analytical Approach for Efficient Data Reconstruction via Convolutional GradientsTamer Ahmed Eltaras, Qutaibah Malluhi, Alessandro Savino et al.
In the effort to learn from extensive collections of distributed data, federated learning has emerged as a promising approach for preserving privacy by using a gradient-sharing mechanism instead of exchanging raw data. However, recent studies show that private training data can be leaked through many gradient attacks. While previous analytical-based attacks have successfully reconstructed input data from fully connected layers, their effectiveness diminishes when applied to convolutional layers. This paper introduces an advanced data leakage method to efficiently exploit convolutional layers' gradients. We present a surprising finding: even with non-fully invertible activation functions, such as ReLU, we can analytically reconstruct training samples from the gradients. To the best of our knowledge, this is the first analytical approach that successfully reconstructs convolutional layer inputs directly from the gradients, bypassing the need to reconstruct layers' outputs. Prior research has mainly concentrated on the weight constraints of convolution layers, overlooking the significance of gradient constraints. Our findings demonstrate that existing analytical methods used to estimate the risk of gradient attacks lack accuracy. In some layers, attacks can be launched with less than 5% of the reported constraints.
ARJan 16, 2024
A Micro Architectural Events Aware Real-Time Embedded System Fault InjectorEnrico Magliano, Alessio Carpegna, Alessadro Savino et al.
In contemporary times, the increasing complexity of the system poses significant challenges to the reliability, trustworthiness, and security of the SACRES. Key issues include the susceptibility to phenomena such as instantaneous voltage spikes, electromagnetic interference, neutron strikes, and out-of-range temperatures. These factors can induce switch state changes in transistors, resulting in bit-flipping, soft errors, and transient corruption of stored data in memory. The occurrence of soft errors, in turn, may lead to system faults that can propel the system into a hazardous state. Particularly in critical sectors like automotive, avionics, or aerospace, such malfunctions can have real-world implications, potentially causing harm to individuals. This paper introduces a novel fault injector designed to facilitate the monitoring, aggregation, and examination of micro-architectural events. This is achieved by harnessing the microprocessor's PMU and the debugging interface, specifically focusing on ensuring the repeatability of fault injections. The fault injection methodology targets bit-flipping within the memory system, affecting CPU registers and RAM. The outcomes of these fault injections enable a thorough analysis of the impact of soft errors and establish a robust correlation between the identified faults and the essential timing predictability demanded by SACRES.
CRFeb 21, 2022
Using analog scrambling circuits for automotive sensor integrity and authenticityCristiano Pegoraro Chenet, Alessandro Savino, Stefano di Carlo
The automotive domain rapidly increases the embedded amount of complex and interconnected electronics systems. A considerable proportion of them are real-time safety-critical devices and must be protected against cybersecurity attacks. Recent regulations impose carmakers to protect vehicles against replacing trusted electronic hardware and manipulating the information collected by sensors. Analog sensors are critical elements whose security is now strictly regulated by the new UN R155 recommendation but lacks well-developed and established solutions. This work takes a step forward in this direction, adding integrity and authentication to automotive analog sensors proposing a schema to create analog signatures based on a scrambling mechanism implemented with commercial-of-the-shelf (COTS) operational amplifiers. The proposed architecture implements a hardware secret and a hard-to-invert exponential function to generate a signal's signature. A prototype of the circuit was implemented and simulated on LTspice. Preliminary results show the feasibility of the proposed schema and provide interesting hints for further developments to increase the robustness of the approach.
NEJan 18, 2022
Spiker: an FPGA-optimized Hardware acceleration for Spiking Neural NetworksAlessio Carpegna, Alessandro Savino, Stefano Di Carlo
Spiking Neural Networks (SNN) are an emerging type of biologically plausible and efficient Artificial Neural Network (ANN). This work presents the development of a hardware accelerator for a SNN for high-performance inference, targeting a Xilinx Artix-7 Field Programmable Gate Array (FPGA). The model used inside the neuron is the Leaky Integrate and Fire (LIF). The execution is clock-driven, meaning that the internal state of the neuron is updated at every clock cycle, even in absence of spikes. The inference capabilities of the accelerator are evaluated using the MINST dataset. The training is performed offline on a full precision model. The results show a good improvement in performance if compared with the state-of-the-art accelerators, requiring 215μs per image. The energy consumption is slightly higher than the most optimized design, with an average value of 13mJ per image. The test design consists of a single layer of four-hundred neurons and uses around 40% of the available resources on the FPGA. This makes it suitable for a time-constrained application at the edge, leaving space for other acceleration tasks on the FPGA.
CRDec 15, 2021
EXT-TAURUM P2T: an Extended Secure CAN-FD Architecture for Road VehiclesFranco Oberti, Alessandro Savino, Ernesto Sanchez et al.
The automobile industry is no longer relying on pure mechanical systems; instead, it benefits from advanced Electronic Control Units (ECUs) in order to provide new and complex functionalities in the effort to move toward fully connected cars. However, connected cars provide a dangerous playground for hackers. Vehicles are becoming increasingly vulnerable to cyber attacks as they come equipped with more connected features and control systems. This situation may expose strategic assets in the automotive value chain. In this scenario, the Controller Area Network (CAN) is the most widely used communication protocol in the automotive domain. However, this protocol lacks encryption and authentication. Consequently, any malicious/hijacked node can cause catastrophic accidents and financial loss. Starting from the analysis of the vulnerability connected to the CAN communication protocol in the automotive domain, this paper proposes EXT-TAURUM P2T a new low-cost secure CAN-FD architecture for the automotive domain implementing secure communication among ECUs, a novel key provisioning strategy, intelligent throughput management, and hardware signature mechanisms. The proposed architecture has been implemented, resorting to a commercial Multi-Protocol Vehicle Interface module, and the obtained results experimentally demonstrate the approach's feasibility.
CRDec 2, 2019
Securing Soft IP Cores in FPGA based Reconfigurable Mobile Heterogeneous SystemsAlberto Carelli, Cataldo Basile, Alessandro Savino et al.
The mobile application market is rapidly growing and changing, offering always brand new software to install in increasingly powerful devices. Mobile devices become pervasive and more heterogeneous, embedding latest technologies such as multicore architectures, special-purpose circuits and reconfigurable logic. In a future mobile market scenario reconfigurable systems are employed to provide high-speed functionalities to assist execution of applications. However, new security concerns are introduced. In particular, protecting the Intellectual Property of the exchanged soft IP cores is a serious concern. The available techniques for preserving integrity, confidentiality and authenticity suffer from the limitation of heavily relying onto the system designer. In this paper we propose two different protocols suitable for the secure deployment of soft IP cores in FPGA-based mobile heterogeneous systems where multiple independent actors are involved: a simple scenario requiring trust relationship between entities, and a more complex scenario where no trust relationship exists through adoption of the Direct Anonymous Attestation protocol. Finally, we provide a prototype implementation of the proposed architectures.