Sudeep Pasricha

LG
h-index36
57papers
906citations
Novelty49%
AI Score54

57 Papers

SPMay 17, 2022Code
A Framework for CSI-Based Indoor Localization with 1D Convolutional Neural Networks

Liping Wang, Sudeep Pasricha

Modern indoor localization techniques are essential to overcome the weak GPS coverage in indoor environments. Recently, considerable progress has been made in Channel State Information (CSI) based indoor localization with signal fingerprints. However, CSI signal patterns can be complicated in the large and highly dynamic indoor spaces with complex interiors, thus a solution for solving this issue is urgently needed to expand the applications of CSI to a broader indoor space. In this paper, we propose an end-to-end solution including data collection, pattern clustering, denoising, calibration and a lightweight one-dimensional convolutional neural network (1D CNN) model with CSI fingerprinting to tackle this problem. We have also created and plan to open source a CSI dataset with a large amount of data collected across complex indoor environments at Colorado State University. Experiments indicate that our approach achieves up to 68.5% improved performance (mean distance error) with minimal number of parameters, compared to the best-known deep machine learning and CSI-based indoor localization works.

ETAug 7, 2023
Analysis of Optical Loss and Crosstalk Noise in MZI-based Coherent Photonic Neural Networks

Amin Shafiee, Sanmitra Banerjee, Krishnendu Chakrabarty et al.

With the continuous increase in the size and complexity of machine learning models, the need for specialized hardware to efficiently run such models is rapidly growing. To address such a need, silicon-photonic-based neural network (SP-NN) accelerators have recently emerged as a promising alternative to electronic accelerators due to their lower latency and higher energy efficiency. Not only can SP-NNs alleviate the fan-in and fan-out problem with linear algebra processors, their operational bandwidth can match that of the photodetection rate (typically 100 GHz), which is at least over an order of magnitude faster than electronic counterparts that are restricted to a clock rate of a few GHz. Unfortunately, the underlying silicon photonic devices in SP-NNs suffer from inherent optical losses and crosstalk noise originating from fabrication imperfections and undesired optical couplings, the impact of which accumulates as the network scales up. Consequently, the inferencing accuracy in an SP-NN can be affected by such inefficiencies -- e.g., can drop to below 10% -- the impact of which is yet to be fully studied. In this paper, we comprehensively model the optical loss and crosstalk noise using a bottom-up approach, from the device to the system level, in coherent SP-NNs built using Mach-Zehnder interferometer (MZI) devices. The proposed models can be applied to any SP-NN architecture with different configurations to analyze the effect of loss and crosstalk. Such an analysis is important where there are inferencing accuracy and scalability requirements to meet when designing an SP-NN. Using the proposed analytical framework, we show a high power penalty and a catastrophic inferencing accuracy drop of up to 84% for SP-NNs of different scales with three known MZI mesh configurations (i.e., Reck, Clements, and Diamond) due to accumulated optical loss and crosstalk noise.

LGMar 22, 2023
TRON: Transformer Neural Network Acceleration with Non-Coherent Silicon Photonics

Salma Afifi, Febin Sunny, Mahdi Nikdast et al.

Transformer neural networks are rapidly being integrated into state-of-the-art solutions for natural language processing (NLP) and computer vision. However, the complex structure of these models creates challenges for accelerating their execution on conventional electronic platforms. We propose the first silicon photonic hardware neural network accelerator called TRON for transformer-based models such as BERT, and Vision Transformers. Our analysis demonstrates that TRON exhibits at least 14x better throughput and 8x better energy efficiency, in comparison to state-of-the-art transformer accelerators.

LGJul 4, 2023
FedHIL: Heterogeneity Resilient Federated Learning for Robust Indoor Localization with Mobile Devices

Danish Gufran, Sudeep Pasricha

Indoor localization plays a vital role in applications such as emergency response, warehouse management, and augmented reality experiences. By deploying machine learning (ML) based indoor localization frameworks on their mobile devices, users can localize themselves in a variety of indoor and subterranean environments. However, achieving accurate indoor localization can be challenging due to heterogeneity in the hardware and software stacks of mobile devices, which can result in inconsistent and inaccurate location estimates. Traditional ML models also heavily rely on initial training data, making them vulnerable to degradation in performance with dynamic changes across indoor environments. To address the challenges due to device heterogeneity and lack of adaptivity, we propose a novel embedded ML framework called FedHIL. Our framework combines indoor localization and federated learning (FL) to improve indoor localization accuracy in device-heterogeneous environments while also preserving user data privacy. FedHIL integrates a domain-specific selective weight adjustment approach to preserve the ML model's performance for indoor localization during FL, even in the presence of extremely noisy data. Experimental evaluations in diverse real-world indoor environments and with heterogeneous mobile devices show that FedHIL outperforms state-of-the-art FL and non-FL indoor localization frameworks. FedHIL is able to achieve 1.62x better localization accuracy on average than the best performing FL-based indoor localization framework from prior work.

SPMay 17, 2022
Multi-Head Attention Neural Network for Smartphone Invariant Indoor Localization

Saideep Tiku, Danish Gufran, Sudeep Pasricha

Smartphones together with RSSI fingerprinting serve as an efficient approach for delivering a low-cost and high-accuracy indoor localization solution. However, a few critical challenges have prevented the wide-spread proliferation of this technology in the public domain. One such critical challenge is device heterogeneity, i.e., the variation in the RSSI signal characteristics captured across different smartphone devices. In the real-world, the smartphones or IoT devices used to capture RSSI fingerprints typically vary across users of an indoor localization service. Conventional indoor localization solutions may not be able to cope with device-induced variations which can degrade their localization accuracy. We propose a multi-head attention neural network-based indoor localization framework that is resilient to device heterogeneity. An in-depth analysis of our proposed framework across a variety of indoor environments demonstrates up to 35% accuracy improvement compared to state-of-the-art indoor localization techniques.

CYNov 2, 2022
AI Ethics in Smart Healthcare

Sudeep Pasricha

This article reviews the landscape of ethical challenges of integrating artificial intelligence (AI) into smart healthcare products, including medical electronic devices. Differences between traditional ethics in the medical domain and emerging ethical challenges with AI-driven healthcare are presented, particularly as they relate to transparency, bias, privacy, safety, responsibility, justice, and autonomy. Open challenges and recommendations are outlined to enable the integration of ethical principles into the design, validation, clinical trials, deployment, monitoring, repair, and retirement of AI-based smart healthcare products.

ARMay 17, 2022
A Silicon Photonic Accelerator for Convolutional Neural Networks with Heterogeneous Quantization

Febin Sunny, Mahdi Nikdast, Sudeep Pasricha

Parameter quantization in convolutional neural networks (CNNs) can help generate efficient models with lower memory footprint and computational complexity. But, homogeneous quantization can result in significant degradation of CNN model accuracy. In contrast, heterogeneous quantization represents a promising approach to realize compact, quantized models with higher inference accuracies. In this paper, we propose HQNNA, a CNN accelerator based on non-coherent silicon photonics that can accelerate both homogeneously quantized and heterogeneously quantized CNN models. Our analyses show that HQNNA achieves up to 73.8x better energy-per-bit and 159.5x better throughput-energy efficiency than state-of-the-art photonic CNN accelerators.

LGAug 31, 2022
RecLight: A Recurrent Neural Network Accelerator with Integrated Silicon Photonics

Febin Sunny, Mahdi Nikdast, Sudeep Pasricha

Recurrent Neural Networks (RNNs) are used in applications that learn dependencies in data sequences, such as speech recognition, human activity recognition, and anomaly detection. In recent years, newer RNN variants, such as GRUs and LSTMs, have been used for implementing these applications. As many of these applications are employed in real-time scenarios, accelerating RNN/LSTM/GRU inference is crucial. In this paper, we propose a novel photonic hardware accelerator called RecLight for accelerating simple RNNs, GRUs, and LSTMs. Simulation results indicate that RecLight achieves 37x lower energy-per-bit and 10% better throughput compared to the state-of-the-art.

AIFeb 18, 2023
VITAL: Vision Transformer Neural Networks for Accurate Smartphone Heterogeneity Resilient Indoor Localization

Danish Gufran, Saideep Tiku, Sudeep Pasricha

Wi-Fi fingerprinting-based indoor localization is an emerging embedded application domain that leverages existing Wi-Fi access points (APs) in buildings to localize users with smartphones. Unfortunately, the heterogeneity of wireless transceivers across diverse smartphones carried by users has been shown to reduce the accuracy and reliability of localization algorithms. In this paper, we propose a novel framework based on vision transformer neural networks called VITAL that addresses this important challenge. Experiments indicate that VITAL can reduce the uncertainty created by smartphone heterogeneity while improving localization accuracy from 41% to 68% over the best-known prior works. We also demonstrate the generalizability of our approach and propose a data augmentation technique that can be integrated into most deep learning-based localization frameworks to improve accuracy.

DCAug 24, 2023
SHIELD: Sustainable Hybrid Evolutionary Learning Framework for Carbon, Wastewater, and Energy-Aware Data Center Management

Sirui Qi, Dejan Milojicic, Cullen Bash et al.

Today's cloud data centers are often distributed geographically to provide robust data services. But these geo-distributed data centers (GDDCs) have a significant associated environmental impact due to their increasing carbon emissions and water usage, which needs to be curtailed. Moreover, the energy costs of operating these data centers continue to rise. This paper proposes a novel framework to co-optimize carbon emissions, water footprint, and energy costs of GDDCs, using a hybrid workload management framework called SHIELD that integrates machine learning guided local search with a decomposition-based evolutionary algorithm. Our framework considers geographical factors and time-based differences in power generation/use, costs, and environmental impacts to intelligently manage workload distribution across GDDCs and data center operation. Experimental results show that SHIELD can realize 34.4x speedup and 2.1x improvement in Pareto Hypervolume while reducing the carbon footprint by up to 3.7x, water footprint by up to 1.8x, energy costs by up to 1.3x, and a cumulative improvement across all objectives (carbon, water, cost) of up to 4.8x compared to the state-of-the-art.

CVMar 3, 2023
R-TOSS: A Framework for Real-Time Object Detection using Semi-Structured Pruning

Abhishek Balasubramaniam, Febin P Sunny, Sudeep Pasricha

Object detectors used in autonomous vehicles can have high memory and computational overheads. In this paper, we introduce a novel semi-structured pruning framework called R-TOSS that overcomes the shortcomings of state-of-the-art model pruning techniques. Experimental results on the JetsonTX2 show that R-TOSS has a compression rate of 4.4x on the YOLOv5 object detector with a 2.15x speedup in inference time and 57.01% decrease in energy usage. R-TOSS also enables 2.89x compression on RetinaNet with a 1.86x speedup in inference time and 56.31% decrease in energy usage. We also demonstrate significant improvements compared to various state-of-the-art pruning techniques.

ARJul 4, 2023
GHOST: A Graph Neural Network Accelerator using Silicon Photonics

Salma Afifi, Febin Sunny, Amin Shafiee et al.

Graph neural networks (GNNs) have emerged as a powerful approach for modelling and learning from graph-structured data. Multiple fields have since benefitted enormously from the capabilities of GNNs, such as recommendation systems, social network analysis, drug discovery, and robotics. However, accelerating and efficiently processing GNNs require a unique approach that goes beyond conventional artificial neural network accelerators, due to the substantial computational and memory requirements of GNNs. The slowdown of scaling in CMOS platforms also motivates a search for alternative implementation substrates. In this paper, we present GHOST, the first silicon-photonic hardware accelerator for GNNs. GHOST efficiently alleviates the costs associated with both vertex-centric and edge-centric operations. It implements separately the three main stages involved in running GNNs in the optical domain, allowing it to be used for the inference of various widely used GNN models and architectures, such as graph convolution networks and graph attention networks. Our simulation studies indicate that GHOST exhibits at least 10.2x better throughput and 3.8x better energy efficiency when compared to GPU, TPU, CPU and multiple state-of-the-art GNN hardware accelerators.

ARJan 28, 2023
Machine Learning Accelerators in 2.5D Chiplet Platforms with Silicon Photonics

Febin Sunny, Ebadollah Taheri, Mahdi Nikdast et al.

Domain-specific machine learning (ML) accelerators such as Google's TPU and Apple's Neural Engine now dominate CPUs and GPUs for energy-efficient ML processing. However, the evolution of electronic accelerators is facing fundamental limits due to the limited computation density of monolithic processing chips and the reliance on slow metallic interconnects. In this paper, we present a vision of how optical computation and communication can be integrated into 2.5D chiplet platforms to drive an entirely new class of sustainable and scalable ML hardware accelerators. We describe how cross-layer design and fabrication of optical devices, circuits, and architectures, and hardware/software codesign can help design efficient photonics-based 2.5D chiplet platforms to accelerate emerging ML workloads.

DCMay 17, 2022
A Survey on Machine Learning for Geo-Distributed Cloud Data Center Management

Ninad Hogade, Sudeep Pasricha

Cloud workloads today are typically managed in a distributed environment and processed across geographically distributed data centers. Cloud service providers have been distributing data centers globally to reduce operating costs while also improving quality of service by using intelligent workload and resource management strategies. Such large scale and complex orchestration of software workload and hardware resources remains a difficult problem to solve efficiently. Researchers and practitioners have been trying to address this problem by proposing a variety of cloud management techniques. Mathematical optimization techniques have historically been used to address cloud management issues. But these techniques are difficult to scale to geo-distributed problem sizes and have limited applicability in dynamic heterogeneous system environments, forcing cloud service providers to explore intelligent data-driven and Machine Learning (ML) based alternatives. The characterization, prediction, control, and optimization of complex, heterogeneous, and ever-changing distributed cloud resources and workloads employing ML methodologies have received much attention in recent years. In this article, we review the state-of-the-art ML techniques for the cloud data center management problem. We examine the challenges and the issues in current research focused on ML for cloud management and explore strategies for addressing these issues. We also discuss advantages and disadvantages of ML techniques presented in the recent literature and make recommendations for future research directions.

LGNov 10, 2023
CALLOC: Curriculum Adversarial Learning for Secure and Robust Indoor Localization

Danish Gufran, Sudeep Pasricha

Indoor localization has become increasingly vital for many applications from tracking assets to delivering personalized services. Yet, achieving pinpoint accuracy remains a challenge due to variations across indoor environments and devices used to assist with localization. Another emerging challenge is adversarial attacks on indoor localization systems that not only threaten service integrity but also reduce localization accuracy. To combat these challenges, we introduce CALLOC, a novel framework designed to resist adversarial attacks and variations across indoor environments and devices that reduce system accuracy and reliability. CALLOC employs a novel adaptive curriculum learning approach with a domain specific lightweight scaled-dot product attention neural network, tailored for adversarial and variation resilience in practical use cases with resource constrained mobile devices. Experimental evaluations demonstrate that CALLOC can achieve improvements of up to 6.03x in mean error and 4.6x in worst-case error against state-of-the-art indoor localization frameworks, across diverse building floorplans, mobile devices, and adversarial attacks scenarios.

CYDec 23, 2022
Ethical Design of Computers: From Semiconductors to IoT and Artificial Intelligence

Sudeep Pasricha, Marilyn Wolf

Computing systems are tightly integrated today into our professional, social, and private lives. An important consequence of this growing ubiquity of computing is that it can have significant ethical implications of which computing professionals should take account. In most real-world scenarios, it is not immediately obvious how particular technical choices during the design and use of computing systems could be viewed from an ethical perspective. This article provides a perspective on the ethical challenges within semiconductor chip design, IoT applications, and the increasing use of artificial intelligence in the design processes, tools, and hardware-software stacks of these systems.

LGMay 17, 2022
Robust Perception Architecture Design for Automotive Cyber-Physical Systems

Joydeep Dey, Sudeep Pasricha

In emerging automotive cyber-physical systems (CPS), accurate environmental perception is critical to achieving safety and performance goals. Enabling robust perception for vehicles requires solving multiple complex problems related to sensor selection/ placement, object detection, and sensor fusion. Current methods address these problems in isolation, which leads to inefficient solutions. We present PASTA, a novel framework for global co-optimization of deep learning and sensing for dependable vehicle perception. Experimental results with the Audi-TT and BMW-Minicooper vehicles show how PASTA can find robust, vehicle-specific perception architecture solutions.

CYOct 21, 2022
Ethics for Digital Medicine: A Path for Ethical Emerging Medical IoT Design

Sudeep Pasricha

The dawn of the digital medicine era, ushered in by increasingly powerful embedded systems and Internet of Things (IoT) computing devices, is creating new therapies and biomedical solutions that promise to positively transform our quality of life. However, the digital medicine revolution also creates unforeseen and complex ethical, regulatory, and societal issues. In this article, we reflect on the ethical challenges facing digital medicine. We discuss the perils of ethical oversights in medical devices, and the role of professional codes and regulatory oversight towards the ethical design, deployment, and operation of digital medicine devices that safely and effectively meet the needs of patients. We advocate for an ensemble approach of intensive education, programmable ethical behaviors, and ethical analysis frameworks, to prevent mishaps and sustain ethical innovation, design, and lifecycle management of emerging digital medicine devices.

ARMay 26, 2022
RACE: A Reinforcement Learning Framework for Improved Adaptive Control of NoC Channel Buffers

Kamil Khan, Sudeep Pasricha, Ryan Gary Kim

Network-on-chip (NoC) architectures rely on buffers to store flits to cope with contention for router resources during packet switching. Recently, reversible multi-function channel (RMC) buffers have been proposed to simultaneously reduce power and enable adaptive NoC buffering between adjacent routers. While adaptive buffering can improve NoC performance by maximizing buffer utilization, controlling the RMC buffer allocations requires a congestion-aware, scalable, and proactive policy. In this work, we present RACE, a novel reinforcement learning (RL) framework that utilizes better awareness of network congestion and a new reward metric ("falsefulls") to help guide the RL agent towards better RMC buffer control decisions. We show that RACE reduces NoC latency by up to 48.9%, and energy consumption by up to 47.1% against state-of-the-art NoC buffer control policies.

LGAug 9, 2024
AI and Machine Learning Driven Indoor Localization and Navigation with Mobile Embedded Systems

Sudeep Pasricha

Indoor navigation is a foundational technology to assist the tracking and localization of humans, autonomous vehicles, drones, and robots in indoor spaces. Due to the lack of penetration of GPS signals in buildings, subterranean locales, and dense urban environments, indoor navigation solutions typically make use of ubiquitous wireless signals (e.g., WiFi) and sensors in mobile embedded systems to perform tracking and localization. This article provides an overview of the many challenges facing state-of-the-art indoor navigation solutions, and then describes how AI algorithms deployed on mobile embedded systems can overcome these challenges.

LGMar 22, 2023
Cross-Layer Design for AI Acceleration with Non-Coherent Optical Computing

Febin Sunny, Mahdi Nikdast, Sudeep Pasricha

Emerging AI applications such as ChatGPT, graph convolutional networks, and other deep neural networks require massive computational resources for training and inference. Contemporary computing platforms such as CPUs, GPUs, and TPUs are struggling to keep up with the demands of these AI applications. Non-coherent optical computing represents a promising approach for light-speed acceleration of AI workloads. In this paper, we show how cross-layer design can overcome challenges in non-coherent optical computing platforms. We describe approaches for optical device engineering, tuning circuit enhancements, and architectural innovations to adapt optical computing to a variety of AI workloads. We also discuss techniques for hardware/software co-design that can intelligently map and adapt AI software to improve its performance on non-coherent optical computing platforms.

LGMar 10, 2023
MOELA: A Multi-Objective Evolutionary/Learning Design Space Exploration Framework for 3D Heterogeneous Manycore Platforms

Sirui Qi, Yingheng Li, Sudeep Pasricha et al.

To enable emerging applications such as deep machine learning and graph processing, 3D network-on-chip (NoC) enabled heterogeneous manycore platforms that can integrate many processing elements (PEs) are needed. However, designing such complex systems with multiple objectives can be challenging due to the huge associated design space and long evaluation times. To optimize such systems, we propose a new multi-objective design space exploration framework called MOELA that combines the benefits of evolutionary-based search with a learning-based local search to quickly determine PE and communication link placement to optimize multiple objectives (e.g., latency, throughput, and energy) in 3D NoC enabled heterogeneous manycore systems. Compared to state-of-the-art approaches, MOELA increases the speed of finding solutions by up to 128x, leads to a better Pareto Hypervolume (PHV) by up to 12.14x and improves energy-delay-product (EDP) by up to 7.7% in a 5-objective scenario.

ARJul 11, 2024
OPIMA: Optical Processing-In-Memory for Convolutional Neural Network Acceleration

Febin Sunny, Amin Shafiee, Abhishek Balasubramaniam et al.

Recent advances in machine learning (ML) have spotlighted the pressing need for computing architectures that bridge the gap between memory bandwidth and processing power. The advent of deep neural networks has pushed traditional Von Neumann architectures to their limits due to the high latency and energy consumption costs associated with data movement between the processor and memory for these workloads. One of the solutions to overcome this bottleneck is to perform computation within the main memory through processing-in-memory (PIM), thereby limiting data movement and the costs associated with it. However, DRAM-based PIM struggles to achieve high throughput and energy efficiency due to internal data movement bottlenecks and the need for frequent refresh operations. In this work, we introduce OPIMA, a PIM-based ML accelerator, architected within an optical main memory. OPIMA has been designed to leverage the inherent massive parallelism within main memory while performing high-speed, low-energy optical computation to accelerate ML models based on convolutional neural networks. We present a comprehensive analysis of OPIMA to guide design choices and operational mechanisms. Additionally, we evaluate the performance and energy consumption of OPIMA, comparing it with conventional electronic computing systems and emerging photonic PIM architectures. The experimental results show that OPIMA can achieve 2.98x higher throughput and 137x better energy efficiency than the best-known prior work.

SPJul 14, 2024
SENTINEL: Securing Indoor Localization against Adversarial Attacks with Capsule Neural Networks

Danish Gufran, Pooja Anandathirtha, Sudeep Pasricha

With the increasing demand for edge device powered location-based services in indoor environments, Wi-Fi received signal strength (RSS) fingerprinting has become popular, given the unavailability of GPS indoors. However, achieving robust and efficient indoor localization faces several challenges, due to RSS fluctuations from dynamic changes in indoor environments and heterogeneity of edge devices, leading to diminished localization accuracy. While advances in machine learning (ML) have shown promise in mitigating these phenomena, it remains an open problem. Additionally, emerging threats from adversarial attacks on ML-enhanced indoor localization systems, especially those introduced by malicious or rogue access points (APs), can deceive ML models to further increase localization errors. To address these challenges, we present SENTINEL, a novel embedded ML framework utilizing modified capsule neural networks to bolster the resilience of indoor localization solutions against adversarial attacks, device heterogeneity, and dynamic RSS fluctuations. We also introduce RSSRogueLoc, a novel dataset capturing the effects of rogue APs from several real-world indoor environments. Experimental evaluations demonstrate that SENTINEL achieves significant improvements, with up to 3.5x reduction in mean error and 3.4x reduction in worst-case error compared to state-of-the-art frameworks using simulated adversarial attacks. SENTINEL also achieves improvements of up to 2.8x in mean error and 2.7x in worst-case error compared to state-of-the-art frameworks when evaluated with the real-world RSSRogueLoc dataset.

LGMar 3, 2023
Adversarial Attacks on Machine Learning in Embedded and IoT Platforms

Christian Westbrook, Sudeep Pasricha

Machine learning (ML) algorithms are increasingly being integrated into embedded and IoT systems that surround us, and they are vulnerable to adversarial attacks. The deployment of these ML algorithms on resource-limited embedded platforms also requires the use of model compression techniques. The impact of such model compression techniques on adversarial robustness in ML is an important and emerging area of research. This article provides an overview of the landscape of adversarial attacks and ML model compression techniques relevant to embedded systems. We then describe efforts that seek to understand the relationship between adversarial attacks and ML model compression before discussing open problems in this area.

ARJul 17, 2024
ARTEMIS: A Mixed Analog-Stochastic In-DRAM Accelerator for Transformer Neural Networks

Salma Afifi, Ishan Thakkar, Sudeep Pasricha

Transformers have emerged as a powerful tool for natural language processing (NLP) and computer vision. Through the attention mechanism, these models have exhibited remarkable performance gains when compared to conventional approaches like recurrent neural networks (RNNs) and convolutional neural networks (CNNs). Nevertheless, transformers typically demand substantial execution time due to their extensive computations and large memory footprint. Processing in-memory (PIM) and near-memory computing (NMC) are promising solutions to accelerating transformers as they offer high compute parallelism and memory bandwidth. However, designing PIM/NMC architectures to support the complex operations and massive amounts of data that need to be moved between layers in transformer neural networks remains a challenge. We propose ARTEMIS, a mixed analog-stochastic in-DRAM accelerator for transformer models. Through employing minimal changes to the conventional DRAM arrays, ARTEMIS efficiently alleviates the costs associated with transformer model execution by supporting stochastic computing for multiplications and temporal analog accumulations using a novel in-DRAM metal-on-metal capacitor. Our analysis indicates that ARTEMIS exhibits at least 3.0x speedup, 1.8x lower energy, and 1.9x better energy efficiency compared to GPU, TPU, CPU, and state-of-the-art PIM transformer hardware accelerators.

44.7LGMar 20
ARMOR: Adaptive Resilience Against Model Poisoning Attacks in Continual Federated Learning for Mobile Indoor Localization

Danish Gufran, Akhil Singampalli, Sudeep Pasricha

Indoor localization has become increasingly essential for applications ranging from asset tracking to delivering personalized services. Federated learning (FL) offers a privacy-preserving approach by training a centralized global model (GM) using distributed data from mobile devices without sharing raw data. However, real-world deployments require a continual federated learning (CFL) setting, where the GM receives continual updates under device heterogeneity and evolving indoor environments. In such dynamic conditions, erroneous or biased updates can cause the GM to deviate from its expected learning trajectory, gradually degrading internal GM representations and GM localization performance. This vulnerability is further exacerbated by adversarial model poisoning attacks. To address this challenge, we propose ARMOR, a novel CFL-based framework that monitors and safeguards the GM during continual updates. ARMOR introduces a novel state-space model (SSM) that learns the historical evolution of GM weight tensors and predicts the expected next state of weight tensors of the GM. By comparing incoming local updates with this SSM projection, ARMOR detects deviations and selectively mitigates corrupted updates before local updates are aggregated with the GM. This mechanism enables robust adaptation to temporal environmental dynamics and mitigate the effects of model poisoning attacks while preventing GM corruption. Experimental evaluations in real-world conditions indicate that ARMOR achieves notable improvements, with up to 8.0x reduction in mean error and 4.97x reduction in worst-case error compared to state-of-the-art indoor localization frameworks, demonstrating strong resilience against model corruption tested using real-world data and mobile devices.

ARJul 30, 2024
Optical Computing for Deep Neural Network Acceleration: Foundations, Recent Developments, and Emerging Directions

Sudeep Pasricha

Emerging artificial intelligence applications across the domains of computer vision, natural language processing, graph processing, and sequence prediction increasingly rely on deep neural networks (DNNs). These DNNs require significant compute and memory resources for training and inference. Traditional computing platforms such as CPUs, GPUs, and TPUs are struggling to keep up with the demands of the increasingly complex and diverse DNNs. Optical computing represents an exciting new paradigm for light-speed acceleration of DNN workloads. In this article, we discuss the fundamentals and state-of-the-art developments in optical computing, with an emphasis on DNN acceleration. Various promising approaches are described for engineering optical devices, enhancing optical circuits, and designing architectures that can adapt optical computing to a variety of DNN workloads. Novel techniques for hardware/software co-design that can intelligently tune and map DNN models to improve performance and energy-efficiency on optical computing platforms across high performance and resource constrained embedded, edge, and IoT platforms are also discussed. Lastly, several open problems and future directions for research in this domain are highlighted.

8.7ARMay 6
MCFlash: Bulk Bitwise Processing in 3D NAND with Dynamic Sensing and Multi-level Encoding

Habib Ur Rahman, Tharini Suresh, Sudeep Pasricha et al.

This paper presents MCFlash, a practical and immediately deployable technique for executing bulk bitwise operations directly within commercial off-the-shelf(COTS) 3D NAND flash chips. MCFlash relies solely on standard user-mode instructions, combining Multi-Level Cell (MLC) data encodings with dynamically tuned read reference voltages to execute in-place bitwise operations. We evaluate MCFlash across diverse NAND flash chips, both floating-gate and charge-trap variants, from different generations. Our results represent the first demonstration of error-free, on-chip bitwise operations, sustaining over one billion operations on fresh blocks and maintaining bit-error rates below 0.015% even after 10,000 program/erase (P/E) cycles.

LGMar 3, 2024
SANGRIA: Stacked Autoencoder Neural Networks with Gradient Boosting for Indoor Localization

Danish Gufran, Saideep Tiku, Sudeep Pasricha

Indoor localization is a critical task in many embedded applications, such as asset tracking, emergency response, and realtime navigation. In this article, we propose a novel fingerprintingbased framework for indoor localization called SANGRIA that uses stacked autoencoder neural networks with gradient boosted trees. Our approach is designed to overcome the device heterogeneity challenge that can create uncertainty in wireless signal measurements across embedded devices used for localization. We compare SANGRIA to several state-of-the-art frameworks and demonstrate 42.96% lower average localization error across diverse indoor locales and heterogeneous devices.

22.5AIApr 30
Focus Session: Autonomous Systems Dependability in the era of AI: Design Challenges in Safety, Security, Reliability and Certification

Behnaz Ranjbar, Kirankumar Raveendiran, Sudeep Pasricha et al.

The design of embedded safety-critical systems such as those used in next-generation automotive and autonomous platforms, is increasingly challenged by escalating system complexity, hardware-software heterogeneity, and the integration of intelligent, data-driven components. Ensuring dependability in such systems requires a holistic approach that spans multiple abstraction layers and encompasses both design- and run-time assurance. Traditional methods for reliability, safety, and security management often fall short in addressing the dynamic and uncertain behaviors introduced by Artificial Intelligence (AI) and Machine Learning (ML) components, especially under stringent real-time, power, and safety constraints. While AI and ML offer powerful predictive, adaptive, and self-optimizing capabilities that can enhance system dependability, their inherent non-determinism, data-dependence, and lack of formal guarantees introduce new challenges for verification, validation, and certification. This paper explores emerging methodologies, architectures, and frameworks for designing dependable autonomous and embedded systems in the era of AI. It highlight advances in reliability modeling, secure system design, and certification approaches that account for imperfect, learning-enabled components, aiming to bridge the gap between AI innovation and certifiable system-level dependability.

LGDec 16, 2023
STELLAR: Siamese Multi-Headed Attention Neural Networks for Overcoming Temporal Variations and Device Heterogeneity with Indoor Localization

Danish Gufran, Saideep Tiku, Sudeep Pasricha

Smartphone-based indoor localization has emerged as a cost-effective and accurate solution to localize mobile and IoT devices indoors. However, the challenges of device heterogeneity and temporal variations have hindered its widespread adoption and accuracy. Towards jointly addressing these challenges comprehensively, we propose STELLAR, a novel framework implementing a contrastive learning approach that leverages a Siamese multi-headed attention neural network. STELLAR is the first solution that simultaneously tackles device heterogeneity and temporal variations in indoor localization, without the need for retraining the model (re-calibration-free). Our evaluations across diverse indoor environments show 8-75% improvements in accuracy compared to state-of-the-art techniques, to effectively address the device heterogeneity challenge. Moreover, STELLAR outperforms existing methods by 18-165% over 2 years of temporal variations, showcasing its robustness and adaptability.

DCApr 1, 2024
Game-Theoretic Deep Reinforcement Learning to Minimize Carbon Emissions and Energy Costs for AI Inference Workloads in Geo-Distributed Data Centers

Ninad Hogade, Sudeep Pasricha

Data centers are increasingly using more energy due to the rise in Artificial Intelligence (AI) workloads, which negatively impacts the environment and raises operational costs. Reducing operating expenses and carbon emissions while maintaining performance in data centers is a challenging problem. This work introduces a unique approach combining Game Theory (GT) and Deep Reinforcement Learning (DRL) for optimizing the distribution of AI inference workloads in geo-distributed data centers to reduce carbon emissions and cloud operating (energy + data transfer) costs. The proposed technique integrates the principles of non-cooperative Game Theory into a DRL framework, enabling data centers to make intelligent decisions regarding workload allocation while considering the heterogeneity of hardware resources, the dynamic nature of electricity prices, inter-data center data transfer costs, and carbon footprints. We conducted extensive experiments comparing our game-theoretic DRL (GT-DRL) approach with current DRL-based and other optimization techniques. The results demonstrate that our strategy outperforms the state-of-the-art in reducing carbon emissions and minimizing cloud operating costs without compromising computational performance. This work has significant implications for achieving sustainability and cost-efficiency in data centers handling AI inference workloads across diverse geographic locations.

ARMar 7, 2024
Silicon Photonic 2.5D Interposer Networks for Overcoming Communication Bottlenecks in Scale-out Machine Learning Hardware Accelerators

Febin Sunny, Ebadollah Taheri, Mahdi Nikdast et al.

Modern machine learning (ML) applications are becoming increasingly complex and monolithic (single chip) accelerator architectures cannot keep up with their energy efficiency and throughput demands. Even though modern digital electronic accelerators are gradually adopting 2.5D architectures with multiple smaller chiplets to improve scalability, they face fundamental limitations due to a reliance on slow metallic interconnects. This paper outlines how optical communication and computation can be leveraged in 2.5D platforms to realize energy-efficient and high throughput 2.5D ML accelerator architectures.

CRNov 22, 2024
SafeLight: Enhancing Security in Optical Convolutional Neural Network Accelerators

Salma Afifi, Ishan Thakkar, Sudeep Pasricha

The rapid proliferation of deep learning has revolutionized computing hardware, driving innovations to improve computationally expensive multiply-and-accumulate operations in deep neural networks. Among these innovations are integrated silicon-photonic systems that have emerged as energy-efficient platforms capable of achieving light speed computation and communication, positioning optical neural network (ONN) platforms as a transformative technology for accelerating deep learning models such as convolutional neural networks (CNNs). However, the increasing complexity of optical hardware introduces new vulnerabilities, notably the risk of hardware trojan (HT) attacks. Despite the growing interest in ONN platforms, little attention has been given to how HT-induced threats can compromise performance and security. This paper presents an in-depth analysis of the impact of such attacks on the performance of CNN models accelerated by ONN accelerators. Specifically, we show how HTs can compromise microring resonators (MRs) in a state-of-the-art non-coherent ONN accelerator and reduce classification accuracy across CNN models by up to 7.49% to 80.46% by just targeting 10% of MRs. We then propose techniques to enhance ONN accelerator robustness against these attacks and show how the best techniques can effectively recover the accuracy drops.

ARJan 12, 2024
Accelerating Neural Networks for Large Language Models and Graph Processing with Silicon Photonics

Salma Afifi, Febin Sunny, Mahdi Nikdast et al.

In the rapidly evolving landscape of artificial intelligence, large language models (LLMs) and graph processing have emerged as transformative technologies for natural language processing (NLP), computer vision, and graph-structured data applications. However, the complex structures of these models pose challenges for acceleration on conventional electronic platforms. In this paper, we describe novel hardware accelerators based on silicon photonics to accelerate transformer neural networks that are used in LLMs and graph neural networks for graph data processing. Our analysis demonstrates that both hardware accelerators achieve at least 10.2x throughput improvement and 3.8x better energy efficiency over multiple state-of-the-art electronic hardware accelerators designed for LLMs and graph processing.

ARMar 8
Accelerating Diffusion Models for Generative AI Applications with Silicon Photonics

Tharini Suresh, Salma Afifi, Sudeep Pasricha

Diffusion models have revolutionized generative AI, with their inherent capacity to generate highly realistic state-of-the-art synthetic data. However, these models employ an iterative denoising process over computationally intensive layers such as UNets and attention mechanisms. This results in high inference energy on conventional electronic platforms, and thus, there is an emerging need to accelerate these models in a sustainable manner. To address this challenge, we present a novel silicon photonics-based accelerator for diffusion models. Experimental evaluations demonstrate that our photonic accelerator achieves at least 3x better energy efficiency and 5.5x throughput improvement compared to state-of-the-art diffusion model accelerators.

LGNov 21, 2025
Unified Class and Domain Incremental Learning with Mixture of Experts for Indoor Localization

Akhil Singampalli, Sudeep Pasricha

Indoor localization using machine learning has gained traction due to the growing demand for location-based services. However, its long-term reliability is hindered by hardware/software variations across mobile devices, which shift the model's input distribution to create domain shifts. Further, evolving indoor environments can introduce new locations over time, expanding the output space to create class shifts, making static machine learning models ineffective over time. To address these challenges, we propose a novel unified continual learning framework for indoor localization called MOELO that, for the first time, jointly addresses domain-incremental and class-incremental learning scenarios. MOELO enables a lightweight, robust, and adaptive localization solution that can be deployed on resource-limited mobile devices and is capable of continual learning in dynamic, heterogeneous real-world settings. This is made possible by a mixture-of-experts architecture, where experts are incrementally trained per region and selected through an equiangular tight frame based gating mechanism ensuring efficient routing, and low-latency inference, all within a compact model footprint. Experimental evaluations show that MOELO achieves improvements of up to 25.6x in mean localization error, 44.5x in worst-case localization error, and 21.5x lesser forgetting compared to state-of-the-art frameworks across diverse buildings, mobile devices, and learning scenarios.

LGJul 15, 2025
GATE: Graph Attention Neural Networks with Real-Time Edge Construction for Robust Indoor Localization using Mobile Embedded Devices

Danish Gufran, Sudeep Pasricha

Accurate indoor localization is crucial for enabling spatial context in smart environments and navigation systems. Wi-Fi Received Signal Strength (RSS) fingerprinting is a widely used indoor localization approach due to its compatibility with mobile embedded devices. Deep Learning (DL) models improve accuracy in localization tasks by learning RSS variations across locations, but they assume fingerprint vectors exist in a Euclidean space, failing to incorporate spatial relationships and the non-uniform distribution of real-world RSS noise. This results in poor generalization across heterogeneous mobile devices, where variations in hardware and signal processing distort RSS readings. Graph Neural Networks (GNNs) can improve upon conventional DL models by encoding indoor locations as nodes and modeling their spatial and signal relationships as edges. However, GNNs struggle with non-Euclidean noise distributions and suffer from the GNN blind spot problem, leading to degraded accuracy in environments with dense access points (APs). To address these challenges, we propose GATE, a novel framework that constructs an adaptive graph representation of fingerprint vectors while preserving an indoor state-space topology, modeling the non-Euclidean structure of RSS noise to mitigate environmental noise and address device heterogeneity. GATE introduces 1) a novel Attention Hyperspace Vector (AHV) for enhanced message passing, 2) a novel Multi-Dimensional Hyperspace Vector (MDHV) to mitigate the GNN blind spot, and 3) an new Real-Time Edge Construction (RTEC) approach for dynamic graph adaptation. Extensive real-world evaluations across multiple indoor spaces with varying path lengths, AP densities, and heterogeneous devices demonstrate that GATE achieves 1.6x to 4.72x lower mean localization errors and 1.85x to 4.57x lower worst-case errors compared to state-of-the-art indoor localization frameworks.

LGJun 18, 2025
Towards Explainable Indoor Localization: Interpreting Neural Network Learning on Wi-Fi Fingerprints Using Logic Gates

Danish Gufran, Sudeep Pasricha

Indoor localization using deep learning (DL) has demonstrated strong accuracy in mapping Wi-Fi RSS fingerprints to physical locations; however, most existing DL frameworks function as black-box models, offering limited insight into how predictions are made or how models respond to real-world noise over time. This lack of interpretability hampers our ability to understand the impact of temporal variations - caused by environmental dynamics - and to adapt models for long-term reliability. To address this, we introduce LogNet, a novel logic gate-based framework designed to interpret and enhance DL-based indoor localization. LogNet enables transparent reasoning by identifying which access points (APs) are most influential for each reference point (RP) and reveals how environmental noise disrupts DL-driven localization decisions. This interpretability allows us to trace and diagnose model failures and adapt DL systems for more stable long-term deployments. Evaluations across multiple real-world building floorplans and over two years of temporal variation show that LogNet not only interprets the internal behavior of DL models but also improves performance-achieving up to 1.1x to 2.8x lower localization error, 3.4x to 43.3x smaller model size, and 1.5x to 3.6x lower latency compared to prior DL-based models.

LGJun 18, 2025
DAILOC: Domain-Incremental Learning for Indoor Localization using Smartphones

Akhil Singampalli, Danish Gufran, Sudeep Pasricha

Wi-Fi fingerprinting-based indoor localization faces significant challenges in real-world deployments due to domain shifts arising from device heterogeneity and temporal variations within indoor environments. Existing approaches often address these issues independently, resulting in poor generalization and susceptibility to catastrophic forgetting over time. In this work, we propose DAILOC, a novel domain-incremental learning framework that jointly addresses both temporal and device-induced domain shifts. DAILOC introduces a novel disentanglement strategy that separates domain shifts from location-relevant features using a multi-level variational autoencoder. Additionally, we introduce a novel memory-guided class latent alignment mechanism to address the effects of catastrophic forgetting over time. Experiments across multiple smartphones, buildings, and time instances demonstrate that DAILOC significantly outperforms state-of-the-art methods, achieving up to 2.74x lower average error and 4.6x lower worst-case error.

DCMay 29, 2025
Sustainable Carbon-Aware and Water-Efficient LLM Scheduling in Geo-Distributed Cloud Datacenters

Hayden Moore, Sirui Qi, Ninad Hogade et al.

In recent years, Large Language Models (LLM) such as ChatGPT, CoPilot, and Gemini have been widely adopted in different areas. As the use of LLMs continues to grow, many efforts have focused on reducing the massive training overheads of these models. But it is the environmental impact of handling user requests to LLMs that is increasingly becoming a concern. Recent studies estimate that the costs of operating LLMs in their inference phase can exceed training costs by 25x per year. As LLMs are queried incessantly, the cumulative carbon footprint for the operational phase has been shown to far exceed the footprint during the training phase. Further, estimates indicate that 500 ml of fresh water is expended for every 20-50 requests to LLMs during inference. To address these important sustainability issues with LLMs, we propose a novel framework called SLIT to co-optimize LLM quality of service (time-to-first token), carbon emissions, water usage, and energy costs. The framework utilizes a machine learning (ML) based metaheuristic to enhance the sustainability of LLM hosting across geo-distributed cloud datacenters. Such a framework will become increasingly vital as LLMs proliferate.

ARJan 23, 2025
PhotoGAN: Generative Adversarial Neural Network Acceleration with Silicon Photonics

Tharini Suresh, Salma Afifi, Sudeep Pasricha

Generative Adversarial Networks (GANs) are at the forefront of AI innovation, driving advancements in areas such as image synthesis, medical imaging, and data augmentation. However, the unique computational operations within GANs, such as transposed convolutions and instance normalization, introduce significant inefficiencies when executed on traditional electronic accelerators, resulting in high energy consumption and suboptimal performance. To address these challenges, we introduce PhotoGAN, the first silicon-photonic accelerator designed to handle the specialized operations of GAN models. By leveraging the inherent high throughput and energy efficiency of silicon photonics, PhotoGAN offers an innovative, reconfigurable architecture capable of accelerating transposed convolutions and other GAN-specific layers. The accelerator also incorporates a sparse computation optimization technique to reduce redundant operations, improving computational efficiency. Our experimental results demonstrate that PhotoGAN achieves at least 4.4x higher GOPS and 2.18x lower energy-per-bit (EPB) compared to state-of-the-art accelerators, including GPUs and TPUs. These findings showcase PhotoGAN as a promising solution for the next generation of GAN acceleration, providing substantial gains in both performance and energy efficiency.

CVJan 8, 2025
UPAQ: A Framework for Real-Time and Energy-Efficient 3D Object Detection in Autonomous Vehicles

Abhishek Balasubramaniam, Febin P Sunny, Sudeep Pasricha

To enhance perception in autonomous vehicles (AVs), recent efforts are concentrating on 3D object detectors, which deliver more comprehensive predictions than traditional 2D object detectors, at the cost of increased memory footprint and computational resource usage. We present a novel framework called UPAQ, which leverages semi-structured pattern pruning and quantization to improve the efficiency of LiDAR point-cloud and camera-based 3D object detectors on resource-constrained embedded AV platforms. Experimental results on the Jetson Orin Nano embedded platform indicate that UPAQ achieves up to 5.62x and 5.13x model compression rates, up to 1.97x and 1.86x boost in inference speed, and up to 2.07x and 1.87x reduction in energy consumption compared to state-of-the-art model compression frameworks, on the Pointpillar and SMOKE models respectively.

LGNov 13, 2024
SAFELOC: Overcoming Data Poisoning Attacks in Heterogeneous Federated Machine Learning for Indoor Localization

Akhil Singampalli, Danish Gufran, Sudeep Pasricha

Machine learning (ML) based indoor localization solutions are critical for many emerging applications, yet their efficacy is often compromised by hardware/software variations across mobile devices (i.e., device heterogeneity) and the threat of ML data poisoning attacks. Conventional methods aimed at countering these challenges show limited resilience to the uncertainties created by these phenomena. In response, in this paper, we introduce SAFELOC, a novel framework that not only minimizes localization errors under these challenging conditions but also ensures model compactness for efficient mobile device deployment. Our framework targets a distributed and co-operative learning environment that uses federated learning (FL) to preserve user data privacy and assumes heterogeneous mobile devices carried by users (just like in most real-world scenarios). Within this heterogeneous FL context, SAFELOC introduces a novel fused neural network architecture that performs data poisoning detection and localization, with a low model footprint. Additionally, a dynamic saliency map-based aggregation strategy is designed to adapt based on the severity of the detected data poisoning scenario. Experimental evaluations demonstrate that SAFELOC achieves improvements of up to 5.9x in mean localization error, 7.8x in worst-case localization error, and a 2.1x reduction in model inference latency compared to state-of-the-art indoor localization frameworks, across diverse building floorplans, mobile devices, and ML data poisoning attack scenarios.

CVJan 19, 2022
Object Detection in Autonomous Vehicles: Status and Open Challenges

Abhishek Balasubramaniam, Sudeep Pasricha

Object detection is a computer vision task that has become an integral part of many consumer applications today such as surveillance and security systems, mobile text recognition, and diagnosing diseases from MRI/CT scans. Object detection is also one of the critical components to support autonomous driving. Autonomous vehicles rely on the perception of their surroundings to ensure safe and robust driving performance. This perception system uses object detection algorithms to accurately determine objects such as pedestrians, vehicles, traffic signs, and barriers in the vehicle's vicinity. Deep learning-based object detectors play a vital role in finding and localizing these objects in real-time. This article discusses the state-of-the-art in object detectors and open challenges for their integration into autonomous vehicles.

CRJan 19, 2022
Roadmap for Cybersecurity in Autonomous Vehicles

Vipin Kumar Kukkala, Sooryaa Vignesh Thiruloga, Sudeep Pasricha

Autonomous vehicles are on the horizon and will be transforming transportation safety and comfort. These vehicles will be connected to various external systems and utilize advanced embedded systems to perceive their environment and make intelligent decisions. However, this increased connectivity makes these vehicles vulnerable to various cyber-attacks that can have catastrophic effects. Attacks on automotive systems are already on the rise in today's vehicles and are expected to become more commonplace in future autonomous vehicles. Thus, there is a need to strengthen cybersecurity in future autonomous vehicles. In this article, we discuss major automotive cyber-attacks over the past decade and present state-of-the-art solutions that leverage artificial intelligence (AI). We propose a roadmap towards building secure autonomous vehicles and highlight key open challenges that need to be addressed.

ETDec 14, 2021
Pruning Coherent Integrated Photonic Neural Networks Using the Lottery Ticket Hypothesis

Sanmitra Banerjee, Mahdi Nikdast, Sudeep Pasricha et al.

Singular-value-decomposition-based coherent integrated photonic neural networks (SC-IPNNs) have a large footprint, suffer from high static power consumption for training and inference, and cannot be pruned using conventional DNN pruning techniques. We leverage the lottery ticket hypothesis to propose the first hardware-aware pruning method for SC-IPNNs that alleviates these challenges by minimizing the number of weight parameters. We prune a multi-layer perceptron-based SC-IPNN and show that up to 89% of the phase angles, which correspond to weight parameters in SC-IPNNs, can be pruned with a negligible accuracy loss (smaller than 5%) while reducing the static power consumption by up to 86%.

LGNov 28, 2021
Siamese Neural Encoders for Long-Term Indoor Localization with Mobile Devices

Saideep Tiku, Sudeep Pasricha

Fingerprinting-based indoor localization is an emerging application domain for enhanced positioning and tracking of people and assets within indoor locales. The superior pairing of ubiquitously available WiFi signals with computationally capable smartphones is set to revolutionize the area of indoor localization. However, the observed signal characteristics from independently maintained WiFi access points vary greatly over time. Moreover, some of the WiFi access points visible at the initial deployment phase may be replaced or removed over time. These factors are often ignored in indoor localization frameworks and cause gradual and catastrophic degradation of localization accuracy post-deployment (over weeks and months). To overcome these challenges, we propose a Siamese neural encoder-based framework that offers up to 40% reduction in degradation of localization accuracy over time compared to the state-of-the-art in the area, without requiring any retraining.