Mahmoud Al-Qutayri

CR
7papers
82citations
Novelty44%
AI Score38

7 Papers

NEJul 11, 2023
Number Systems for Deep Neural Network Architectures: A Survey

Ghada Alsuhli, Vasileios Sakellariou, Hani Saleh et al.

Deep neural networks (DNNs) have become an enabling component for a myriad of artificial intelligence applications. DNNs have shown sometimes superior performance, even compared to humans, in cases such as self-driving, health applications, etc. Because of their computational complexity, deploying DNNs in resource-constrained devices still faces many challenges related to computing complexity, energy efficiency, latency, and cost. To this end, several research directions are being pursued by both academia and industry to accelerate and efficiently implement DNNs. One important direction is determining the appropriate data representation for the massive amount of data involved in DNN processing. Using conventional number systems has been found to be sub-optimal for DNNs. Alternatively, a great body of research focuses on exploring suitable number systems. This article aims to provide a comprehensive survey and discussion about alternative number systems for more efficient representations of DNN data. Various number systems (conventional/unconventional) exploited for DNNs are discussed. The impact of these number systems on the performance and hardware design of DNNs is considered. In addition, this paper highlights the challenges associated with each number system and various solutions that are proposed for addressing them. The reader will be able to understand the importance of an efficient number system for DNN, learn about the widely used number systems for DNN, understand the trade-offs between various number systems, and consider various design aspects that affect the impact of number systems on DNN performance. In addition, the recent trends and related research opportunities will be highlighted

88.5ETMar 19
From Connectivity to Multi-Orbit Intelligence: Space-Based Data Center Architectures for 6G and Beyond

Shimaa Naser, Maryam Tariq, Raneem Abdel-Rahim et al.

Direct handset-to-satellite (DHTS) communication is emerging as a core capability of 6G non-terrestrial networks, enabling standard devices to directly access low Earth orbit (LEO) satellites. While LEO provides the physical access layer for DHTS, large-scale device connectivity introduces challenges in mobility management, interference control, spectrum efficiency, and constellation-wide coordination. Relay-only LEO architectures are insufficient to manage massive handset access under dynamic traffic and energy constraints. This article introduces a hierarchical architecture in which direct handset-to-LEO access is supported by multi-orbit space-based data centers (SBDCs) spanning LEO, medium Earth orbit (MEO), and geostationary Earth orbit (GEO). In this framework, LEO satellites handle radio access and real-time inference, while higher orbital layers provide regional aggregation, global orchestration, and compute-aware routing. By embedding distributed in-orbit computing, energy-aware scheduling, and AI-driven hierarchical control, the constellation evolves from a passive relay network into an intelligent multi-layer system capable of supporting large-scale DHTS services. We discuss key enabling technologies, envisioned multi-orbit integrated Earth-space compute architecture, and open research challenges in integrating multi-orbit computing, highlighting pathways toward scalable and resilient 6G DHTS networks.

LGJul 17, 2024
Hybrid Dynamic Pruning: A Pathway to Efficient Transformer Inference

Ghadeer Jaradat, Mohammed Tolba, Ghada Alsuhli et al.

In the world of deep learning, Transformer models have become very significant, leading to improvements in many areas from understanding language to recognizing images, covering a wide range of applications. Despite their success, the deployment of these models in real-time applications, particularly on edge devices, poses significant challenges due to their quadratic computational intensity and memory demands. To overcome these challenges we introduce a novel Hybrid Dynamic Pruning (HDP), an efficient algorithm-architecture co-design approach that accelerates transformers using head sparsity, block sparsity and approximation opportunities to reduce computations in attention and reduce memory access. With the observation of the huge redundancy in attention scores and attention heads, we propose a novel integer-based row-balanced block pruning to prune unimportant blocks in the attention matrix at run time, also propose integer-based head pruning to detect and prune unimportant heads at an early stage at run time. Also we propose an approximation method that reduces attention computations. To efficiently support these methods with lower latency and power efficiency, we propose a HDP co-processor architecture.

AIOct 7, 2021
Towards Federated Learning-Enabled Visible Light Communication in 6G Systems

Shimaa Naser, Lina Bariah, Sami Muhaidat et al.

Visible light communication (VLC) technology was introduced as a key enabler for the next generation of wireless networks, mainly thanks to its simple and low-cost implementation. However, several challenges prohibit the realization of the full potentials of VLC, namely, limited modulation bandwidth, ambient light interference, optical diffuse reflection effects, devices non-linearity, and random receiver orientation. On the contrary, centralized machine learning (ML) techniques have demonstrated a significant potential in handling different challenges relating to wireless communication systems. Specifically, it was shown that ML algorithms exhibit superior capabilities in handling complicated network tasks, such as channel equalization, estimation and modeling, resources allocation, and opportunistic spectrum access control, to name a few. Nevertheless, concerns pertaining to privacy and communication overhead when sharing raw data of the involved clients with a server constitute major bottlenecks in the implementation of centralized ML techniques. This has motivated the emergence of a new distributed ML paradigm, namely federated learning (FL), which can reduce the cost associated with transferring raw data, and preserve privacy by training ML models locally and collaboratively at the clients' side. Hence, it becomes evident that integrating FL into VLC networks can provide ubiquitous and reliable implementation of VLC systems. With this motivation, this is the first in-depth review in the literature on the application of FL in VLC networks. To that end, besides the different architectures and related characteristics of FL, we provide a thorough overview on the main design aspects of FL based VLC systems. Finally, we also highlight some potential future research directions of FL that are envisioned to substantially enhance the performance and robustness of VLC systems.

CVApr 28, 2021
Deep Neural Networks Based Weight Approximation and Computation Reuse for 2-D Image Classification

Mohammed F. Tolba, Huruy Tekle Tesfai, Hani Saleh et al.

Deep Neural Networks (DNNs) are computationally and memory intensive, which makes their hardware implementation a challenging task especially for resource constrained devices such as IoT nodes. To address this challenge, this paper introduces a new method to improve DNNs performance by fusing approximate computing with data reuse techniques to be used for image recognition applications. DNNs weights are approximated based on the linear and quadratic approximation methods during the training phase, then, all of the weights are replaced with the linear/quadratic coefficients to execute the inference in a way where different weights could be computed using the same coefficients. This leads to a repetition of the weights across the processing element (PE) array, which in turn enables the reuse of the DNN sub-computations (computational reuse) and leverage the same data (data reuse) to reduce DNNs computations, memory accesses, and improve energy efficiency albeit at the cost of increased training time. Complete analysis for both MNIST and CIFAR 10 datasets is presented for image recognition , where LeNet 5 revealed a reduction in the number of parameters by a factor of 1211.3x with a drop of less than 0.9% in accuracy. When compared to the state of the art Row Stationary (RS) method, the proposed architecture saved 54% of the total number of adders and multipliers needed. Overall, the proposed approach is suitable for IoT edge devices as it reduces the memory size requirement as well as the number of needed memory accesses.

CRDec 29, 2020
UNSAIL: Thwarting Oracle-Less Machine Learning Attacks on Logic Locking

Lilas Alrahis, Satwik Patnaik, Johann Knechtel et al.

Logic locking aims to protect the intellectual property (IP) of integrated circuit (IC) designs throughout the globalized supply chain. The SAIL attack, based on tailored machine learning (ML) models, circumvents combinational logic locking with high accuracy and is amongst the most potent attacks as it does not require a functional IC acting as an oracle. In this work, we propose UNSAIL, a logic locking technique that inserts key-gate structures with the specific aim to confuse ML models like those used in SAIL. More specifically, UNSAIL serves to prevent attacks seeking to resolve the structural transformations of synthesis-induced obfuscation, which is an essential step for logic locking. Our approach is generic; it can protect any local structure of key-gates against such ML-based attacks in an oracle-less setting. We develop a reference implementation for the SAIL attack and launch it on both traditionally locked and UNSAIL-locked designs. In SAIL, a change-prediction model is used to determine which key-gate structures to restore using a reconstruction model. Our study on benchmarks ranging from the ISCAS-85 and ITC-99 suites to the OpenRISC Reference Platform System-on-Chip (ORPSoC) confirms that UNSAIL degrades the accuracy of the change-prediction model and the reconstruction model by an average of 20.13 and 17 percentage points (pp) respectively. When the aforementioned models are combined, which is the most powerful scenario for SAIL, UNSAIL reduces the attack accuracy of SAIL by an average of 11pp. We further demonstrate that UNSAIL thwarts other oracle-less attacks, i.e., SWEEP and the redundancy attack, indicating the generic nature and strength of our approach. Detailed layout-level evaluations illustrate that UNSAIL incurs minimal area and power overheads of 0.26% and 0.61%, respectively, on the million-gate ORPSoC design.

CRSep 10, 2019
ScanSAT: Unlocking Static and Dynamic Scan Obfuscation

Lilas Alrahis, Muhammad Yasin, Nimisha Limaye et al.

While financially advantageous, outsourcing key steps, such as testing, to potentially untrusted Outsourced Assembly and Test (OSAT) companies may pose a risk of compromising on-chip assets. Obfuscation of scan chains is a technique that hides the actual scan data from the untrusted testers; logic inserted between the scan cells, driven by a secret key, hides the transformation functions that map the scan-in stimulus (scan-out response) and the delivered scan pattern (captured response). While static scan obfuscation utilizes the same secret key, and thus, the same secret transformation functions throughout the lifetime of the chip, dynamic scan obfuscation updates the key periodically. In this paper, we propose ScanSAT: an attack that transforms a scan obfuscated circuit to its logic-locked version and applies the Boolean satisfiability (SAT) based attack, thereby extracting the secret key. We implement our attack, apply on representative scan obfuscation techniques, and show that ScanSAT can break both static and dynamic scan obfuscation schemes with 100% success rate. Moreover, ScanSAT is effective even for large key sizes and in the presence of scan compression.