ARAug 28, 2024Code
CGRA4ML: A Hardware/Software Framework to Implement Neural Networks for Scientific Edge ComputingG Abarajithan, Zhenghua Ma, Ravidu Munasinghe et al.
The scientific community increasingly relies on machine learning (ML) for near-sensor processing, leveraging its strengths in tasks such as pattern recognition, anomaly detection, and real-time decision-making. These deployments demand accelerators that combine extremely high performance with programmability, ease of integration, and straightforward verification. We present cgra4ml, an open-source, modular framework that generates parameterizable CGRA accelerators in synthesizable SystemVerilog RTL, tailored to common ML compute patterns found in scientific applications. The framework supports seamless system integration through AXI-compliant interfaces and open-source DMA components, and it includes automatic firmware generation for programming the accelerator. A comprehensive verification suite and a runtime firmware stack further support deployment across diverse SoC platforms. cgra4ml provides a modular, full-stack infrastructure, including a Python API, SystemVerilog hardware, TCL toolflows, and a C runtime, which facilitates easy integration and experimentation, allowing scientists to focus on innovation rather than dealing with the intricacies of hardware design and optimization. We demonstrate the effectiveness of cgra4ml to implement common scientific edge neural networks using ASIC and FPGA design flows.
NIApr 15, 2022
DeepCSI: Rethinking Wi-Fi Radio Fingerprinting Through MU-MIMO CSI Feedback Deep LearningFrancesca Meneghello, Michele Rossi, Francesco Restuccia
We present DeepCSI, a novel approach to Wi-Fi radio fingerprinting (RFP) which leverages standard-compliant beamforming feedback matrices to authenticate MU-MIMO Wi-Fi devices on the move. By capturing unique imperfections in off-the-shelf radio circuitry, RFP techniques can identify wireless devices directly at the physical layer, allowing low-latency low-energy cryptography-free authentication. However, existing Wi-Fi RFP techniques are based on software-defined radio (SDRs), which may ultimately prevent their widespread adoption. Moreover, it is unclear whether existing strategies can work in the presence of MU-MIMO transmitters - a key technology in modern Wi-Fi standards. Conversely from prior work, DeepCSI does not require SDR technologies and can be run on any low-cost Wi-Fi device to authenticate MU-MIMO transmitters. Our key intuition is that imperfections in the transmitter's radio circuitry percolate onto the beamforming feedback matrix, and thus RFP can be performed without explicit channel state information (CSI) computation. DeepCSI is robust to inter-stream and inter-user interference being the beamforming feedback not affected by those phenomena. We extensively evaluate the performance of DeepCSI through a massive data collection campaign performed in the wild with off-the-shelf equipment, where 10 MU-MIMO Wi-Fi radios emit signals in different positions. Experimental results indicate that DeepCSI correctly identifies the transmitter with an accuracy of up to 98%. The identification accuracy remains above 82% when the device moves within the environment. To allow replicability and provide a performance benchmark, we pledge to share the 800 GB datasets - collected in static and, for the first time, dynamic conditions - and the code database with the community.
NIFeb 2, 2023
Exposing the CSI: A Systematic Investigation of CSI-based Wi-Fi Sensing Capabilities and LimitationsMarco Cominelli, Francesco Gringoli, Francesco Restuccia
Thanks to the ubiquitous deployment of Wi-Fi hotspots, channel state information (CSI)-based Wi-Fi sensing can unleash game-changing applications in many fields, such as healthcare, security, and entertainment. However, despite one decade of active research on Wi-Fi sensing, most existing work only considers legacy IEEE 802.11n devices, often in particular and strictly-controlled environments. Worse yet, there is a fundamental lack of understanding of the impact on CSI-based sensing of modern Wi-Fi features, such as 160-MHz bandwidth, multiple-input multiple-output (MIMO) transmissions, and increased spectral resolution in IEEE 802.11ax (Wi-Fi 6). This work aims to shed light on the impact of Wi-Fi 6 features on the sensing performance and to create a benchmark for future research on Wi-Fi sensing. To this end, we perform an extensive CSI data collection campaign involving 3 individuals, 3 environments, and 12 activities, using Wi-Fi 6 signals. An anonymized ground truth obtained through video recording accompanies our 80-GB dataset, which contains almost two hours of CSI data from three collectors. We leverage our dataset to dissect the performance of a state-of-the-art sensing framework across different environments and individuals. Our key findings suggest that (i) MIMO transmissions and higher spectral resolution might be more beneficial than larger bandwidth for sensing applications; (ii) there is a pressing need to standardize research on Wi-Fi sensing because the path towards a truly environment-independent framework is still uncertain. To ease the experiments' replicability and address the current lack of Wi-Fi 6 CSI datasets, we release our 80-GB dataset to the community.
CRJul 31, 2024
Resilience and Security of Deep Neural Networks Against Intentional and Unintentional Perturbations: Survey and Research ChallengesSazzad Sayyed, Milin Zhang, Shahriar Rifat et al.
In order to deploy deep neural networks (DNNs) in high-stakes scenarios, it is imperative that DNNs provide inference robust to external perturbations - both intentional and unintentional. Although the resilience of DNNs to intentional and unintentional perturbations has been widely investigated, a unified vision of these inherently intertwined problem domains is still missing. In this work, we fill this gap by providing a survey of the state of the art and highlighting the similarities of the proposed approaches.We also analyze the research challenges that need to be addressed to deploy resilient and secure DNNs. As there has not been any such survey connecting the resilience of DNNs to intentional and unintentional perturbations, we believe this work can help advance the frontier in both domains by enabling the exchange of ideas between the two communities.
NIJan 16, 2023
A$^2$-UAV: Application-Aware Content and Network Optimization of Edge-Assisted UAV SystemsAndrea Coletta, Flavio Giorgi, Gaia Maselli et al.
To perform advanced surveillance, Unmanned Aerial Vehicles (UAVs) require the execution of edge-assisted computer vision (CV) tasks. In multi-hop UAV networks, the successful transmission of these tasks to the edge is severely challenged due to severe bandwidth constraints. For this reason, we propose a novel A$^2$-UAV framework to optimize the number of correctly executed tasks at the edge. In stark contrast with existing art, we take an application-aware approach and formulate a novel pplication-Aware Task Planning Problem (A$^2$-TPP) that takes into account (i) the relationship between deep neural network (DNN) accuracy and image compression for the classes of interest based on the available dataset, (ii) the target positions, (iii) the current energy/position of the UAVs to optimize routing, data pre-processing and target assignment for each UAV. We demonstrate A$^2$-TPP is NP-Hard and propose a polynomial-time algorithm to solve it efficiently. We extensively evaluate A$^2$-UAV through real-world experiments with a testbed composed by four DJI Mavic Air 2 UAVs. We consider state-of-the-art image classification tasks with four different DNN models (i.e., DenseNet, ResNet152, ResNet50 and MobileNet-V2) and object detection tasks using YoloV4 trained on the ImageNet dataset. Results show that A$^2$-UAV attains on average around 38% more accomplished tasks than the state-of-the-art, with 400% more accomplished tasks when the number of targets increases significantly. To allow full reproducibility, we pledge to share datasets and code with the research community.
56.5ARApr 21
Design Rules for Extreme-Edge Scientific Computing on AI EnginesZhenghua Ma, G Abarajithan, Dimitrios Danopoulos et al.
Extreme-edge scientific applications use machine learning models to analyze sensor data and make real-time decisions. Their stringent latency and throughput requirements demand small batch sizes and require that model weights remain fully on-chip. Spatial dataflow implementations are common for extreme-edge applications. Spatial dataflow works well for small networks, but it fails to scale to larger models due to inherent resource scaling limitations. AI Engines on modern FPGA SoCs offer a promising alternative with high compute density and additional on-chip memory. However, the architecture, programming model, and performance-scaling behavior of AI Engines differ fundamentally from those of the programmable logic, making direct comparison non-trivial and the benefits of using AI Engines unclear. This work addresses how and when extreme-edge scientific neural networks should be implemented on AI Engines versus programmable logic. We provide systematic architectural characterization and micro-benchmarking and introduce a latency-adjusted resource equivalence (LARE) metric that identifies when AI Engine implementations outperform programmable logic designs. We further propose spatial and API-level dataflow optimizations tailored to low-latency scientific inference. Finally, we demonstrate the successful deployment of end-to-end neural networks on AI Engines that cannot fit on programmable logic when using the hlsml toolchain.
NIOct 12, 2023
SplitBeam: Effective and Efficient Beamforming in Wi-Fi Networks Through Split ComputingNiloofar Bahadori, Yoshitomo Matsubara, Marco Levorato et al.
Modern IEEE 802.11 (Wi-Fi) networks extensively rely on multiple-input multiple-output (MIMO) to significantly improve throughput. To correctly beamform MIMO transmissions, the access point needs to frequently acquire a beamforming matrix (BM) from each connected station. However, the size of the matrix grows with the number of antennas and subcarriers, resulting in an increasing amount of airtime overhead and computational load at the station. Conventional approaches come with either excessive computational load or loss of beamforming precision. For this reason, we propose SplitBeam, a new framework where we train a split deep neural network (DNN) to directly output the BM given the channel state information (CSI) matrix as input. We formulate and solve a bottleneck optimization problem (BOP) to keep computation, airtime overhead, and bit error rate (BER) below application requirements. We perform extensive experimental CSI collection with off-the-shelf Wi-Fi devices in two distinct environments and compare the performance of SplitBeam with the standard IEEE 802.11 algorithm for BM feedback and the state-of-the-art DNN-based approach LB-SciFi. Our experimental results show that SplitBeam reduces the beamforming feedback size and computational complexity by respectively up to 81% and 84% while maintaining BER within about 10^-3 of existing approaches. We also implement the SplitBeam DNNs on FPGA hardware to estimate the end-to-end BM reporting delay, and show that the latter is less than 10 milliseconds in the most complex scenario, which is the target channel sounding frequency in realistic multi-user MIMO scenarios.
17.9ARMar 26
FireBridge: Cycle-Accurate Hardware + Firmware Co-Verification for Modern AcceleratorsG Abarajithan, Zhenghua Ma, Francesco Restuccia et al.
Hardware-firmware integration is becoming a productivity bottleneck due to the increasing complexity of accelerators, characterized by intricate memory hierarchies and firmware-intensive execution. While numerous verification techniques focus on early-stage, approximate modeling of such systems to speed up initial development, developers still rely heavily on FPGA emulation to integrate firmware with RTL/HLS hardware, resulting in significant delays in debug iterations and time-to-market. We present a fast, cycle-accurate co-verification framework that bridges production firmware and RTL/gate-level hardware. FIREBRIDGE enables firmware debugging, profiling, and verification in seconds using standard simulators such as VCS, Vivado Xsim, or Xcelium, by compiling the firmware for x86 and bridging it with simulated subsystems via randomized memory bridges. Our approach provides off-chip data movement profiling, memory congestion emulation, and register-level protocol testing, which are critical for modern accelerator verification. We demonstrate a speedup of up to 50x in debug iteration over the conventional FPGA-based flow for system integration between RTL/HLS and production firmware on various types of accelerators, such as systolic arrays and CGRAs, while ensuring functional equivalence. FIREBRIDGE accelerates system integration by supporting robust co-verification of hardware and firmware, and promotes a structured, parallel development workflow tailored for teams building heterogeneous computing platforms.
LGSep 29, 2023
Adversarial Attacks to Latent Representations of Distributed Neural Networks in Split ComputingMilin Zhang, Mohammad Abdi, Jonathan Ashdown et al.
Distributed deep neural networks (DNNs) have been shown to reduce the computational burden of mobile devices and decrease the end-to-end inference latency in edge computing scenarios. While distributed DNNs have been studied, to the best of our knowledge, the resilience of distributed DNNs to adversarial action remains an open problem. In this paper, we fill the existing research gap by rigorously analyzing the robustness of distributed DNNs against adversarial action. We cast this problem in the context of information theory and rigorously proved that (i) the compressed latent dimension improves the robustness but also affect task-oriented performance; and (ii) the deeper splitting point enhances the robustness but also increases the computational burden. These two trade-offs provide a novel perspective to design robust distributed DNN. To test our theoretical findings, we perform extensive experimental analysis by considering 6 different DNN architectures, 6 different approaches for distributed DNN and 10 different adversarial attacks using the ImageNet-1K dataset.
CVSep 15, 2024
DARDA: Domain-Aware Real-Time Dynamic Neural Network AdaptationShahriar Rifat, Jonathan Ashdown, Francesco Restuccia
Test Time Adaptation (TTA) has emerged as a practical solution to mitigate the performance degradation of Deep Neural Networks (DNNs) in the presence of corruption/ noise affecting inputs. Existing approaches in TTA continuously adapt the DNN, leading to excessive resource consumption and performance degradation due to accumulation of error stemming from lack of supervision. In this work, we propose Domain-Aware Real-Time Dynamic Adaptation (DARDA) to address such issues. Our key approach is to proactively learn latent representations of some corruption types, each one associated with a sub-network state tailored to correctly classify inputs affected by that corruption. After deployment, DARDA adapts the DNN to previously unseen corruptions in an unsupervised fashion by (i) estimating the latent representation of the ongoing corruption; (ii) selecting the sub-network whose associated corruption is the closest in the latent space to the ongoing corruption; and (iii) adapting DNN state, so that its representation matches the ongoing corruption. This way, DARDA is more resource efficient and can swiftly adapt to new distributions caused by different corruptions without requiring a large variety of input data. Through experiments with two popular mobile edge devices - Raspberry Pi and NVIDIA Jetson Nano - we show that DARDA reduces energy consumption and average cache memory footprint respectively by 1.74x and 2.64x with respect to the state of the art, while increasing the performance by 10.4%, 5.7% and 4.4% on CIFAR-10, CIFAR-100 and TinyImagenet.
CVOct 2, 2023
SINF: Semantic Neural Network Inference with Semantic SubgraphsA. Q. M. Sazzad Sayyed, Francesco Restuccia
This paper proposes Semantic Inference (SINF) that creates semantic subgraphs in a Deep Neural Network(DNN) based on a new Discriminative Capability Score (DCS) to drastically reduce the DNN computational load with limited performance loss.~We evaluate the performance SINF on VGG16, VGG19, and ResNet50 DNNs trained on CIFAR100 and a subset of the ImageNet dataset. Moreover, we compare its performance against 6 state-of-the-art pruning approaches. Our results show that (i) on average, SINF reduces the inference time of VGG16, VGG19, and ResNet50 respectively by up to 29%, 35%, and 15% with only 3.75%, 0.17%, and 6.75% accuracy loss for CIFAR100 while for ImageNet benchmark, the reduction in inference time is 18%, 22%, and 9% for accuracy drop of 3%, 2.5%, and 6%; (ii) DCS achieves respectively up to 3.65%, 4.25%, and 2.36% better accuracy with VGG16, VGG19, and ResNet50 with respect to existing discriminative scores for CIFAR100 and the same for ImageNet is 8.9%, 5.8%, and 5.2% respectively. Through experimental evaluation on Raspberry Pi and NVIDIA Jetson Nano, we show SINF is about 51% and 38% more energy efficient and takes about 25% and 17% less inference time than the base model for CIFAR100 and ImageNet.
CVJan 22
TinySense: Effective CSI Compression for Scalable and Accurate Wi-Fi SensingToan Gian, Dung T. Tran, Viet Quoc Pham et al.
With the growing demand for device-free and privacy-preserving sensing solutions, Wi-Fi sensing has emerged as a promising approach for human pose estimation (HPE). However, existing methods often process vast amounts of channel state information (CSI) data directly, ultimately straining networking resources. This paper introduces TinySense, an efficient compression framework that enhances the scalability of Wi-Fi-based human sensing. Our approach is based on a new vector quantization-based generative adversarial network (VQGAN). Specifically, by leveraging a VQGAN-learned codebook, TinySense significantly reduces CSI data while maintaining the accuracy required for reliable HPE. To optimize compression, we employ the K-means algorithm to dynamically adjust compression bitrates to cluster a large-scale pre-trained codebook into smaller subsets. Furthermore, a Transformer model is incorporated to mitigate bitrate loss, enhancing robustness in unreliable networking conditions. We prototype TinySense on an experimental testbed using Jetson Nano and Raspberry Pi to measure latency and network resource use. Extensive results demonstrate that TinySense significantly outperforms state-of-the-art compression schemes, achieving up to 1.5x higher HPE accuracy score (PCK20) under the same compression rate. It also reduces latency and networking overhead, respectively, by up to 5x and 2.5x. The code repository is available online at here.
LGNov 27, 2024
Semantic Edge Computing and Semantic Communications in 6G Networks: A Unifying Survey and Research ChallengesMilin Zhang, Mohammad Abdi, Venkat R. Dasari et al.
Semantic Edge Computing (SEC) and Semantic Communications (SemComs) have been proposed as viable approaches to achieve real-time edge-enabled intelligence in sixth-generation (6G) wireless networks. On one hand, SemCom leverages the strength of Deep Neural Networks (DNNs) to encode and communicate the semantic information only, while making it robust to channel distortions by compensating for wireless effects. Ultimately, this leads to an improvement in the communication efficiency. On the other hand, SEC has leveraged distributed DNNs to divide the computation of a DNN across different devices based on their computational and networking constraints. Although significant progress has been made in both fields, the literature lacks a systematic view to connect both fields. In this work, we fulfill the current gap by unifying the SEC and SemCom fields. We summarize the research problems in these two fields and provide a comprehensive review of the state of the art with a focus on their technical strengths and challenges.
CVFeb 15, 2024
SAWEC: Sensing-Assisted Wireless Edge ComputingKhandaker Foysal Haque, Francesca Meneghello, Md. Ebtidaul Karim et al.
Emerging mobile virtual reality (VR) systems will require to continuously perform complex computer vision tasks on ultra-high-resolution video frames through the execution of deep neural networks (DNNs)-based algorithms. Since state-of-the-art DNNs require computational power that is excessive for mobile devices, techniques based on wireless edge computing (WEC) have been recently proposed. However, existing WEC methods require the transmission and processing of a high amount of video data which may ultimately saturate the wireless link. In this paper, we propose a novel Sensing-Assisted Wireless Edge Computing (SAWEC) paradigm to address this issue. SAWEC leverages knowledge about the physical environment to reduce the end-to-end latency and overall computational burden by transmitting to the edge server only the relevant data for the delivery of the service. Our intuition is that the transmission of the portion of the video frames where there are no changes with respect to previous frames can be avoided. Specifically, we leverage wireless sensing techniques to estimate the location of objects in the environment and obtain insights about the environment dynamics. Hence, only the part of the frames where any environmental change is detected is transmitted and processed. We evaluated SAWEC by using a 10K 360$^{\circ}$ with a Wi-Fi 6 sensing system operating at 160 MHz and performing localization and tracking. We considered instance segmentation and object detection as benchmarking tasks for performance evaluation. We carried out experiments in an anechoic chamber and an entrance hall with two human subjects in six different setups. Experimental results show that SAWEC reduces both the channel occupation and end-to-end latency by more than 90% while improving the instance segmentation and object detection performance with respect to state-of-the-art WEC approaches.
LGMar 1, 2024
Resilience of Entropy Model in Distributed Neural NetworksMilin Zhang, Mohammad Abdi, Shahriar Rifat et al.
Distributed deep neural networks (DNNs) have emerged as a key technique to reduce communication overhead without sacrificing performance in edge computing systems. Recently, entropy coding has been introduced to further reduce the communication overhead. The key idea is to train the distributed DNN jointly with an entropy model, which is used as side information during inference time to adaptively encode latent representations into bit streams with variable length. To the best of our knowledge, the resilience of entropy models is yet to be investigated. As such, in this paper we formulate and investigate the resilience of entropy models to intentional interference (e.g., adversarial attacks) and unintentional interference (e.g., weather changes and motion blur). Through an extensive experimental campaign with 3 different DNN architectures, 2 entropy models and 4 rate-distortion trade-off factors, we demonstrate that the entropy attacks can increase the communication overhead by up to 95%. By separating compression features in frequency and spatial domain, we propose a new defense mechanism that can reduce the transmission overhead of the attacked input by about 9% compared to unperturbed data, with only about 2% accuracy loss. Importantly, the proposed defense mechanism is a standalone approach which can be applied in conjunction with approaches such as adversarial training to further improve robustness. Code will be shared for reproducibility.
DCNov 16, 2025
Semantic MultiplexingMohammad Abdi, Francesca Meneghello, Francesco Restuccia
Mobile devices increasingly require the parallel execution of several computing tasks offloaded at the wireless edge. Existing communication systems only support parallel transmissions at the bit level, which fundamentally limits the number of tasks that can be concurrently processed. To address this bottleneck, this paper introduces the new concept of Semantic Multiplexing. Our approach shifts stream multiplexing from bits to tasks by merging multiple task-related compressed representations into a single semantic representation. As such, Semantic Multiplexing can multiplex more tasks than the number of physical channels without adding antennas or widening bandwidth by extending the effective degrees of freedom at the semantic layer, without contradicting Shannon capacity rules. We have prototyped Semantic Multiplexing on an experimental testbed with Jetson Orin Nano and millimeter-wave software-defined radios and tested its performance on image classification and sentiment analysis while comparing to several existing baselines in semantic communications. Our experiments demonstrate that Semantic Multiplexing allows jointly processing multiple tasks at the semantic level while maintaining sufficient task accuracy. For example, image classification accuracy drops by less than 4% when increasing from 2 to 8 the number of tasks multiplexed over a 4$\times$4 channel. Semantic Multiplexing reduces latency, energy consumption, and communication load respectively by up to 8$\times$, 25$\times$, and 54$\times$ compared to the baselines while keeping comparable performance. We pledge to publicly share the complete software codebase and the collected datasets for reproducibility.
DCJan 11, 2022
SmartDet: Context-Aware Dynamic Control of Edge Task Offloading for Mobile Object DetectionDavide Callegaro, Francesco Restuccia, Marco Levorato
Mobile devices increasingly rely on object detection (OD) through deep neural networks (DNNs) to perform critical tasks. Due to their high complexity, the execution of these DNNs requires excessive time and energy. Low-complexity object tracking (OT) can be used with OD, where the latter is periodically applied to generate "fresh" references for tracking. However, the frames processed with OD incur large delays, which may make the reference outdated and degrade tracking quality. Herein, we propose to use edge computing in this context, and establish parallel OT (at the mobile device) and OD (at the edge server) processes that are resilient to large OD latency. We propose Katch-Up, a novel tracking mechanism that improves the system resilience to excessive OD delay. However, while Katch-Up significantly improves performance, it also increases the computing load of the mobile device. Hence, we design SmartDet, a low-complexity controller based on deep reinforcement learning (DRL) that learns controlling the trade-off between resource utilization and OD performance. SmartDet takes as input context-related information related to the current video content and the current network conditions to optimize frequency and type of OD offloading, as well as Katch-Up utilization. We extensively evaluate SmartDet on a real-world testbed composed of a JetSon Nano as mobile device and a GTX 980 Ti as edge server, connected through a Wi-Fi link. Experimental results show that SmartDet achieves an optimal balance between tracking performance - mean Average Recall (mAR) and resource usage. With respect to a baseline with full Katch-Upusage and maximum channel usage, we still increase mAR by 4% while using 50% less of the channel and 30% power resources associated with Katch-Up. With respect to a fixed strategy using minimal resources, we increase mAR by 20% while using Katch-Up on 1/3 of the frames.
LGJan 7, 2022
BottleFit: Learning Compressed Representations in Deep Neural Networks for Effective and Efficient Split ComputingYoshitomo Matsubara, Davide Callegaro, Sameer Singh et al.
Although mission-critical applications require the use of deep neural networks (DNNs), their continuous execution at mobile devices results in a significant increase in energy consumption. While edge offloading can decrease energy consumption, erratic patterns in channel quality, network and edge server load can lead to severe disruption of the system's key operations. An alternative approach, called split computing, generates compressed representations within the model (called "bottlenecks"), to reduce bandwidth usage and energy consumption. Prior work has proposed approaches that introduce additional layers, to the detriment of energy consumption and latency. For this reason, we propose a new framework called BottleFit, which, in addition to targeted DNN architecture modifications, includes a novel training strategy to achieve high accuracy even with strong compression rates. We apply BottleFit on cutting-edge DNN models in image classification, and show that BottleFit achieves 77.1% data compression with up to 0.6% accuracy loss on ImageNet dataset, while state of the art such as SPINN loses up to 6% in accuracy. We experimentally measure the power consumption and latency of an image classification application running on an NVIDIA Jetson Nano board (GPU-based) and a Raspberry PI board (GPU-less). We show that BottleFit decreases power consumption and latency respectively by up to 49% and 89% with respect to (w.r.t.) local computing and by 37% and 55% w.r.t. edge offloading. We also compare BottleFit with state-of-the-art autoencoders-based approaches, and show that (i) BottleFit reduces power consumption and execution time respectively by up to 54% and 44% on the Jetson and 40% and 62% on Raspberry PI; (ii) the size of the head model executed on the mobile device is 83 times smaller. We publish the code repository for reproducibility of the results in this study.
LGDec 7, 2021
Federated Deep Reinforcement Learning for the Distributed Control of NextG Wireless NetworksPeyman Tehrani, Francesco Restuccia, Marco Levorato
Next Generation (NextG) networks are expected to support demanding tactile internet applications such as augmented reality and connected autonomous vehicles. Whereas recent innovations bring the promise of larger link capacity, their sensitivity to the environment and erratic performance defy traditional model-based control rationales. Zero-touch data-driven approaches can improve the ability of the network to adapt to the current operating conditions. Tools such as reinforcement learning (RL) algorithms can build optimal control policy solely based on a history of observations. Specifically, deep RL (DRL), which uses a deep neural network (DNN) as a predictor, has been shown to achieve good performance even in complex environments and with high dimensional inputs. However, the training of DRL models require a large amount of data, which may limit its adaptability to ever-evolving statistics of the underlying environment. Moreover, wireless networks are inherently distributed systems, where centralized DRL approaches would require excessive data exchange, while fully distributed approaches may result in slower convergence rates and performance degradation. In this paper, to address these challenges, we propose a federated learning (FL) approach to DRL, which we refer to federated DRL (F-DRL), where base stations (BS) collaboratively train the embedded DNN by only sharing models' weights rather than training data. We evaluate two distinct versions of F-DRL, value and policy based, and show the superior performance they achieve compared to distributed and centralized DRL.
NIOct 20, 2021
Colosseum: Large-Scale Wireless Experimentation Through Hardware-in-the-Loop Network EmulationLeonardo Bonati, Pedram Johari, Michele Polese et al.
Colosseum is an open-access and publicly-available large-scale wireless testbed for experimental research via virtualized and softwarized waveforms and protocol stacks on a fully programmable, "white-box" platform. Through 256 state-of-the-art software-defined radios and a massive channel emulator core, Colosseum can model virtually any scenario, enabling the design, development and testing of solutions at scale in a variety of deployments and channel conditions. These Colosseum radio-frequency scenarios are reproduced through high-fidelity FPGA-based emulation with finite-impulse response filters. Filters model the taps of desired wireless channels and apply them to the signals generated by the radio nodes, faithfully mimicking the conditions of real-world wireless environments. In this paper, we introduce Colosseum as a testbed that is for the first time open to the research community. We describe the architecture of Colosseum and its experimentation and emulation capabilities. We then demonstrate the effectiveness of Colosseum for experimental research at scale through exemplary use cases including prevailing wireless technologies (e.g., cellular and Wi-Fi) in spectrum sharing and unmanned aerial vehicle scenarios. A roadmap for Colosseum future updates concludes the paper.
CRJun 24, 2021
AKER: A Design and Verification Framework for Safe andSecure SoC Access ControlFrancesco Restuccia, Andres Meza, Ryan Kastner
Modern systems on a chip (SoCs) utilize heterogeneous architectures where multiple IP cores have concurrent access to on-chip shared resources. In security-critical applications, IP cores have different privilege levels for accessing shared resources, which must be regulated by an access control system. AKER is a design and verification framework for SoC access control. AKER builds upon the Access Control Wrapper (ACW) -- a high performance and easy-to-integrate hardware module that dynamically manages access to shared resources. To build an SoC access control system, AKER distributes the ACWs throughout the SoC, wrapping controller IP cores, and configuring the ACWs to perform local access control. To ensure the access control system is functioning correctly and securely, AKER provides a property-driven security verification using MITRE common weakness enumerations. AKER verifies the SoC access control at the IP level to ensure the absence of bugs in the functionalities of the ACW module, at the firmware level to confirm the secure operation of the ACW when integrated with a hardware root-of-trust (HRoT), and at the system level to evaluate security threats due to the interactions among shared resources. The performance, resource usage, and security of access control systems implemented through AKER is experimentally evaluated on a Xilinx UltraScale+ programmable SoC, it is integrated with the OpenTitan hardware root-of-trust, and it is used to design an access control system for the OpenPULP multicore architecture.
CRJun 14, 2021
Isadora: Automated Information Flow Property Generation for Hardware DesignsCalvin Deutschbein, Andres Meza, Francesco Restuccia et al.
Isadora is a methodology for creating information flow specifications of hardware designs. The methodology combines information flow tracking and specification mining to produce a set of information flow properties that are suitable for use during the security validation process, and which support a better understanding of the security posture of the design. Isadora is fully automated; the user provides only the design under consideration and a testbench and need not supply a threat model nor security specifications. We evaluate Isadora on a RISC-V processor plus two designs related to SoC access control. Isadora generates security properties that align with those suggested by the Common Weakness Enumerations (CWEs), and in the case of the SoC designs, align with the properties written manually by security experts.
LGMay 8, 2021
The Tags Are Alright: Robust Large-Scale RFID Clone Detection Through Federated Data-Augmented Radio FingerprintingMauro Piva, Gaia Maselli, Francesco Restuccia
Millions of RFID tags are pervasively used all around the globe to inexpensively identify a wide variety of everyday-use objects. One of the key issues of RFID is that tags cannot use energy-hungry cryptography. For this reason, radio fingerprinting (RFP) is a compelling approach that leverages the unique imperfections in the tag's wireless circuitry to achieve large-scale RFID clone detection. Recent work, however, has unveiled that time-varying channel conditions can significantly decrease the accuracy of the RFP process. We propose the first large-scale investigation into RFP of RFID tags with dynamic channel conditions. Specifically, we perform a massive data collection campaign on a testbed composed by 200 off-the-shelf identical RFID tags and a software-defined radio (SDR) tag reader. We collect data with different tag-reader distances in an over-the-air configuration. To emulate implanted RFID tags, we also collect data with two different kinds of porcine meat inserted between the tag and the reader. We use this rich dataset to train and test several convolutional neural network (CNN)--based classifiers in a variety of channel conditions. Our investigation reveals that training and testing on different channel conditions drastically degrades the classifier's accuracy. For this reason, we propose a novel training framework based on federated machine learning (FML) and data augmentation (DAG) to boost the accuracy. Extensive experimental results indicate that (i) our FML approach improves accuracy by up to 48%; (ii) our DA approach improves the FML performance by up to 31%. To the best of our knowledge, this is the first paper experimentally demonstrating the efficacy of FML and DA on a large device population. We are sharing with the research community our fully-labeled 200-GB RFID waveform dataset, the entirety of our code and trained models.
SPMar 8, 2021
Split Computing and Early Exiting for Deep Learning Applications: Survey and Research ChallengesYoshitomo Matsubara, Marco Levorato, Francesco Restuccia
Mobile devices such as smartphones and autonomous vehicles increasingly rely on deep neural networks (DNNs) to execute complex inference tasks such as image classification and speech recognition, among others. However, continuously executing the entire DNN on mobile devices can quickly deplete their battery. Although task offloading to cloud/edge servers may decrease the mobile device's computational burden, erratic patterns in channel quality, network, and edge server load can lead to a significant delay in task execution. Recently, approaches based on split computing (SC) have been proposed, where the DNN is split into a head and a tail model, executed respectively on the mobile device and on the edge server. Ultimately, this may reduce bandwidth usage as well as energy consumption. Another approach, called early exiting (EE), trains models to embed multiple "exits" earlier in the architecture, each providing increasingly higher target accuracy. Therefore, the trade-off between accuracy and delay can be tuned according to the current conditions or application demands. In this paper, we provide a comprehensive survey of the state of the art in SC and EE strategies by presenting a comparison of the most relevant approaches. We conclude the paper by providing a set of compelling research challenges.
NIMar 5, 2021
Can You Fix My Neural Network? Real-Time Adaptive Waveform Synthesis for Resilient Wireless Signal ClassificationSalvatore D'Oro, Francesco Restuccia, Tommaso Melodia
Thanks to its capability of classifying complex phenomena without explicit modeling, deep learning (DL) has been demonstrated to be a key enabler of Wireless Signal Classification (WSC). Although DL can achieve a very high accuracy under certain conditions, recent research has unveiled that the wireless channel can disrupt the features learned by the DL model during training, thus drastically reducing the classification performance in real-world live settings. Since retraining classifiers is cumbersome after deployment, existing work has leveraged the usage of carefully-tailored Finite Impulse Response (FIR) filters that, when applied at the transmitter's side, can restore the features that are lost because of the the channel actions, i.e., waveform synthesis. However, these approaches compute FIRs using offline optimization strategies, which limits their efficacy in highly-dynamic channel settings. In this paper, we improve the state of the art by proposing Chares, a Deep Reinforcement Learning (DRL)-based framework for channel-resilient adaptive waveform synthesis. Chares adapts to new and unseen channel conditions by optimally computing through DRL the FIRs in real-time. Chares is a DRL agent whose architecture is-based upon the Twin Delayed Deep Deterministic Policy Gradients (TD3), which requires minimal feedback from the receiver and explores a continuous action space. Chares has been extensively evaluated on two well-known datasets. We have also evaluated the real-time latency of Chares with an implementation on field-programmable gate array (FPGA). Results show that Chares increases the accuracy up to 4.1x when no waveform synthesis is performed, by 1.9x with respect to existing work, and can compute new actions within 41us.
NIFeb 10, 2021
SteaLTE: Private 5G Cellular Connectivity as a Service with Full-stack Wireless SteganographyLeonardo Bonati, Salvatore D'Oro, Francesco Restuccia et al.
Fifth-generation (5G) systems will extensively employ radio access network (RAN) softwarization. This key innovation enables the instantiation of "virtual cellular networks" running on different slices of the shared physical infrastructure. In this paper, we propose the concept of Private Cellular Connectivity as a Service (PCCaaS), where infrastructure providers deploy covert network slices known only to a subset of users. We then present SteaLTE as the first realization of a PCCaaS-enabling system for cellular networks. At its core, SteaLTE utilizes wireless steganography to disguise data as noise to adversarial receivers. Differently from previous work, however, it takes a full-stack approach to steganography, contributing an LTE-compliant steganographic protocol stack for PCCaaS-based communications, and packet schedulers and operations to embed covert data streams on top of traditional cellular traffic (primary traffic). SteaLTE balances undetectability and performance by mimicking channel impairments so that covert data waveforms are almost indistinguishable from noise. We evaluate the performance of SteaLTE on an indoor LTE-compliant testbed under different traffic profiles, distance and mobility patterns. We further test it on the outdoor PAWR POWDER platform over long-range cellular links. Results show that in most experiments SteaLTE imposes little loss of primary traffic throughput in presence of covert data transmissions (< 6%), making it suitable for undetectable PCCaaS networking.
CRFeb 7, 2021
What is a Blockchain? A Definition to Clarify the Role of the Blockchain in the Internet of ThingsLorenzo Ghiro, Francesco Restuccia, Salvatore D'Oro et al.
The use of the term blockchain is documented for disparate projects, from cryptocurrencies to applications for the Internet of Things (IoT), and many more. The concept of blockchain appears therefore blurred, as it is hard to believe that the same technology can empower applications that have extremely different requirements and exhibit dissimilar performance and security. This position paper elaborates on the theory of distributed systems to advance a clear definition of blockchain that allows us to clarify its role in the IoT. This definition inextricably binds together three elements that, as a whole, provide the blockchain with those unique features that distinguish it from other distributed ledger technologies: immutability, transparency and anonimity. We note however that immutability comes at the expense of remarkable resource consumption, transparency demands no confidentiality and anonymity prevents user identification and registration. This is in stark contrast to the requirements of most IoT applications that are made up of resource constrained devices, whose data need to be kept confidential and users to be clearly known. Building on the proposed definition, we derive new guidelines for selecting the proper distributed ledger technology depending on application requirements and trust models, identifying common pitfalls leading to improper applications of the blockchain. We finally indicate a feasible role of the blockchain for the IoT: myriads of local, IoT transactions can be aggregated off-chain and then be successfully recorded on an external blockchain as a means of public accountability when required.
CRMay 6, 2019
Hiding Data in Plain Sight: Undetectable Wireless Communications Through Pseudo-Noise Asymmetric Shift KeyingSalvatore D'Oro, Francesco Restuccia, Tommaso Melodia
Undetectable wireless transmissions are fundamental to avoid eavesdroppers. To address this issue, wireless steganography hides covert information inside primary information by slightly modifying the transmitted waveform such that primary information will still be decodable, while covert information will be seen as noise by agnostic receivers. Since the addition of covert information inevitably decreases the SNR of the primary transmission, key challenges in wireless steganography are: i) to assess the impact of the covert channel on the primary channel as a function of different channel conditions; and ii) to make sure that the covert channel is undetectable. Existing approaches are protocol-specific, also we notice that existing wireless technologies rely on phase-keying modulations that in most cases do not use the channel up to its Shannon capacity. Therefore, the residual capacity can be leveraged to implement a wireless system based on a pseudo-noise asymmetric shift keying (PN-ASK) modulation, where covert symbols are mapped by shifting the amplitude of primary symbols. This way, covert information will be undetectable, since a receiver expecting phase-modulated symbols will see their shift in amplitude as an effect of channel/path loss degradation. We first investigate the SER of PN-ASK as a function of the channel; then, we find the optimal PN-ASK parameters that optimize primary and covert throughput under different channel condition. We evaluate the throughput performance and undetectability of PN-ASK through extensive simulations and on an experimental testbed based on USRP N210 software-defined radios. We show that PN-ASK improves the throughput by more than 8x with respect to prior art. Finally, we demonstrate through experiments that PN-ASK is able to transmit covert data on top of IEEE 802.11g frames, which are correctly decoded by an off-the-shelf laptop WiFi.
CRMar 18, 2019
Blockchain for the Internet of Things: Present and FutureFrancesco Restuccia, Salvatore D'Oro andSalil S. Kanhere, Tommaso Melodia et al.
One of the key challenges to the IoT's success is how to secure and anonymize billions of IoT transactions and devices per day, an issue that still lingers despite significant research efforts over the last few years. On the other hand, technologies based on blockchain algorithms are disrupting today's cryptocurrency markets and showing tremendous potential, since they provide a distributed transaction ledger that cannot be tampered with or controlled by a single entity. Although the blockchain may present itself as a cure-all for the IoT's security and privacy challenges, significant research efforts still need to be put forth to adapt the computation-intensive blockchain algorithms to the stringent energy and processing constraints of today's IoT devices. In this paper, we provide an overview of existing literature on the topic of blockchain for IoT, and present a roadmap of research challenges that will need to be addressed to enable the usage of blockchain technologies in the IoT.
NIMar 12, 2019
Big Data Goes Small: Real-Time Spectrum-Driven Embedded Wireless Networking Through Deep Learning in the RF LoopFrancesco Restuccia, Tommaso Melodia
The explosion of 5G networks and the Internet of Things will result in an exceptionally crowded RF environment, where techniques such as spectrum sharing and dynamic spectrum access will become essential components of the wireless communication process. In this vision, wireless devices must be able to (i) learn to autonomously extract knowledge from the spectrum on-the-fly; and (ii) react in real time to the inferred spectrum knowledge by appropriately changing communication parameters, including frequency band, symbol modulation, coding rate, among others. Traditional CPU-based machine learning suffers from high latency, and requires application-specific and computationally-intensive feature extraction/selection algorithms. In this paper, we present RFLearn, the first system enabling spectrum knowledge extraction from unprocessed I/Q samples by deep learning directly in the RF loop. RFLearn provides (i) a complete hardware/software architecture where the CPU, radio transceiver and learning/actuation circuits are tightly connected for maximum performance; and (ii) a learning circuit design framework where the latency vs. hardware resource consumption trade-off can explored. We implement and evaluate the performance of RFLearn on custom software-defined radio built on a system-on-chip (SoC) ZYNQ-7000 device mounting AD9361 radio transceivers and VERT2450 antennas. We showcase the capabilities of RFLearn by applying it to solving the fundamental problems of modulation and OFDM parameter recognition. Experimental results reveal that RFLearn decreases latency and power by about 17x and 15x with respect to a software-based solution, with a comparatively low hardware resource consumption.
NIJan 23, 2019
Machine Learning for Wireless Communications in the Internet of Things: A Comprehensive SurveyJithin Jagannath, Nicholas Polosky, Anu Jagannath et al.
The Internet of Things (IoT) is expected to require more effective and efficient wireless communications than ever before. For this reason, techniques such as spectrum sharing, dynamic spectrum access, extraction of signal intelligence and optimized routing will soon become essential components of the IoT wireless communication paradigm. Given that the majority of the IoT will be composed of tiny, mobile, and energy-constrained devices, traditional techniques based on a priori network optimization may not be suitable, since (i) an accurate model of the environment may not be readily available in practical scenarios; (ii) the computational requirements of traditional optimization techniques may prove unbearable for IoT devices. To address the above challenges, much research has been devoted to exploring the use of machine learning to address problems in the IoT wireless communications domain. This work provides a comprehensive survey of the state of the art in the application of machine learning techniques to address key problems in IoT wireless communications with an emphasis on its ad hoc networking aspect. First, we present extensive background notions of machine learning techniques. Then, by adopting a bottom-up approach, we examine existing work on machine learning for the IoT at the physical, data-link and network layer of the protocol stack. Thereafter, we discuss directions taken by the community towards hardware implementation to ensure the feasibility of these techniques. Additionally, before concluding, we also provide a brief discussion of the application of machine learning in IoT beyond wireless communication. Finally, each of these discussions is accompanied by a detailed analysis of the related open problems and challenges.
CRMar 13, 2018
Securing the Internet of Things in the Age of Machine Learning and Software-defined NetworkingFrancesco Restuccia, Salvatore D'Oro, Tommaso Melodia
The Internet of Things (IoT) realizes a vision where billions of interconnected devices are deployed just about everywhere, from inside our bodies to the most remote areas of the globe. As the IoT will soon pervade every aspect of our lives and will be accessible from anywhere, addressing critical IoT security threats is now more important than ever. Traditional approaches where security is applied as an afterthought and as a "patch" against known attacks are insufficient. Indeed, next-generation IoT challenges will require a new secure-by-design vision, where threats are addressed proactively and IoT devices learn to dynamically adapt to different threats. To this end, machine learning and software-defined networking will be key to provide both reconfigurability and intelligence to the IoT devices. In this paper, we first provide a taxonomy and survey the state of the art in IoT security research, and offer a roadmap of concrete research challenges related to the application of machine learning and software-defined networking to address existing and next-generation IoT security threats.