ETJul 22, 2022
Characterizing Coherent Integrated Photonic Neural Networks under ImperfectionsSanmitra Banerjee, Mahdi Nikdast, Krishnendu Chakrabarty
Integrated photonic neural networks (IPNNs) are emerging as promising successors to conventional electronic AI accelerators as they offer substantial improvements in computing speed and energy efficiency. In particular, coherent IPNNs use arrays of Mach-Zehnder interferometers (MZIs) for unitary transformations to perform energy-efficient matrix-vector multiplication. However, the underlying MZI devices in IPNNs are susceptible to uncertainties stemming from optical lithographic variations and thermal crosstalk and can experience imprecisions due to non-uniform MZI insertion loss and quantization errors due to low-precision encoding in the tuned phase angles. In this paper, we, for the first time, systematically characterize the impact of such uncertainties and imprecisions (together referred to as imperfections) in IPNNs using a bottom-up approach. We show that their impact on IPNN accuracy can vary widely based on the tuned parameters (e.g., phase angles) of the affected components, their physical location, and the nature and distribution of the imperfections. To improve reliability measures, we identify critical IPNN building blocks that, under imperfections, can lead to catastrophic degradation in the classification accuracy. We show that under multiple simultaneous imperfections, the IPNN inferencing accuracy can degrade by up to 46%, even when the imperfection parameters are restricted within a small range. Our results also indicate that the inferencing accuracy is sensitive to imperfections affecting the MZIs in the linear layers next to the input layer of the IPNN.
ETAug 7, 2023
Analysis of Optical Loss and Crosstalk Noise in MZI-based Coherent Photonic Neural NetworksAmin Shafiee, Sanmitra Banerjee, Krishnendu Chakrabarty et al.
With the continuous increase in the size and complexity of machine learning models, the need for specialized hardware to efficiently run such models is rapidly growing. To address such a need, silicon-photonic-based neural network (SP-NN) accelerators have recently emerged as a promising alternative to electronic accelerators due to their lower latency and higher energy efficiency. Not only can SP-NNs alleviate the fan-in and fan-out problem with linear algebra processors, their operational bandwidth can match that of the photodetection rate (typically 100 GHz), which is at least over an order of magnitude faster than electronic counterparts that are restricted to a clock rate of a few GHz. Unfortunately, the underlying silicon photonic devices in SP-NNs suffer from inherent optical losses and crosstalk noise originating from fabrication imperfections and undesired optical couplings, the impact of which accumulates as the network scales up. Consequently, the inferencing accuracy in an SP-NN can be affected by such inefficiencies -- e.g., can drop to below 10% -- the impact of which is yet to be fully studied. In this paper, we comprehensively model the optical loss and crosstalk noise using a bottom-up approach, from the device to the system level, in coherent SP-NNs built using Mach-Zehnder interferometer (MZI) devices. The proposed models can be applied to any SP-NN architecture with different configurations to analyze the effect of loss and crosstalk. Such an analysis is important where there are inferencing accuracy and scalability requirements to meet when designing an SP-NN. Using the proposed analytical framework, we show a high power penalty and a catastrophic inferencing accuracy drop of up to 84% for SP-NNs of different scales with three known MZI mesh configurations (i.e., Reck, Clements, and Diamond) due to accumulated optical loss and crosstalk noise.
LGMar 22, 2023
TRON: Transformer Neural Network Acceleration with Non-Coherent Silicon PhotonicsSalma Afifi, Febin Sunny, Mahdi Nikdast et al.
Transformer neural networks are rapidly being integrated into state-of-the-art solutions for natural language processing (NLP) and computer vision. However, the complex structure of these models creates challenges for accelerating their execution on conventional electronic platforms. We propose the first silicon photonic hardware neural network accelerator called TRON for transformer-based models such as BERT, and Vision Transformers. Our analysis demonstrates that TRON exhibits at least 14x better throughput and 8x better energy efficiency, in comparison to state-of-the-art transformer accelerators.
ARMay 17, 2022
A Silicon Photonic Accelerator for Convolutional Neural Networks with Heterogeneous QuantizationFebin Sunny, Mahdi Nikdast, Sudeep Pasricha
Parameter quantization in convolutional neural networks (CNNs) can help generate efficient models with lower memory footprint and computational complexity. But, homogeneous quantization can result in significant degradation of CNN model accuracy. In contrast, heterogeneous quantization represents a promising approach to realize compact, quantized models with higher inference accuracies. In this paper, we propose HQNNA, a CNN accelerator based on non-coherent silicon photonics that can accelerate both homogeneously quantized and heterogeneously quantized CNN models. Our analyses show that HQNNA achieves up to 73.8x better energy-per-bit and 159.5x better throughput-energy efficiency than state-of-the-art photonic CNN accelerators.
LGAug 31, 2022
RecLight: A Recurrent Neural Network Accelerator with Integrated Silicon PhotonicsFebin Sunny, Mahdi Nikdast, Sudeep Pasricha
Recurrent Neural Networks (RNNs) are used in applications that learn dependencies in data sequences, such as speech recognition, human activity recognition, and anomaly detection. In recent years, newer RNN variants, such as GRUs and LSTMs, have been used for implementing these applications. As many of these applications are employed in real-time scenarios, accelerating RNN/LSTM/GRU inference is crucial. In this paper, we propose a novel photonic hardware accelerator called RecLight for accelerating simple RNNs, GRUs, and LSTMs. Simulation results indicate that RecLight achieves 37x lower energy-per-bit and 10% better throughput compared to the state-of-the-art.
ARJul 4, 2023
GHOST: A Graph Neural Network Accelerator using Silicon PhotonicsSalma Afifi, Febin Sunny, Amin Shafiee et al.
Graph neural networks (GNNs) have emerged as a powerful approach for modelling and learning from graph-structured data. Multiple fields have since benefitted enormously from the capabilities of GNNs, such as recommendation systems, social network analysis, drug discovery, and robotics. However, accelerating and efficiently processing GNNs require a unique approach that goes beyond conventional artificial neural network accelerators, due to the substantial computational and memory requirements of GNNs. The slowdown of scaling in CMOS platforms also motivates a search for alternative implementation substrates. In this paper, we present GHOST, the first silicon-photonic hardware accelerator for GNNs. GHOST efficiently alleviates the costs associated with both vertex-centric and edge-centric operations. It implements separately the three main stages involved in running GNNs in the optical domain, allowing it to be used for the inference of various widely used GNN models and architectures, such as graph convolution networks and graph attention networks. Our simulation studies indicate that GHOST exhibits at least 10.2x better throughput and 3.8x better energy efficiency when compared to GPU, TPU, CPU and multiple state-of-the-art GNN hardware accelerators.
ARJan 28, 2023
Machine Learning Accelerators in 2.5D Chiplet Platforms with Silicon PhotonicsFebin Sunny, Ebadollah Taheri, Mahdi Nikdast et al.
Domain-specific machine learning (ML) accelerators such as Google's TPU and Apple's Neural Engine now dominate CPUs and GPUs for energy-efficient ML processing. However, the evolution of electronic accelerators is facing fundamental limits due to the limited computation density of monolithic processing chips and the reliance on slow metallic interconnects. In this paper, we present a vision of how optical computation and communication can be integrated into 2.5D chiplet platforms to drive an entirely new class of sustainable and scalable ML hardware accelerators. We describe how cross-layer design and fabrication of optical devices, circuits, and architectures, and hardware/software codesign can help design efficient photonics-based 2.5D chiplet platforms to accelerate emerging ML workloads.
LGMar 22, 2023
Cross-Layer Design for AI Acceleration with Non-Coherent Optical ComputingFebin Sunny, Mahdi Nikdast, Sudeep Pasricha
Emerging AI applications such as ChatGPT, graph convolutional networks, and other deep neural networks require massive computational resources for training and inference. Contemporary computing platforms such as CPUs, GPUs, and TPUs are struggling to keep up with the demands of these AI applications. Non-coherent optical computing represents a promising approach for light-speed acceleration of AI workloads. In this paper, we show how cross-layer design can overcome challenges in non-coherent optical computing platforms. We describe approaches for optical device engineering, tuning circuit enhancements, and architectural innovations to adapt optical computing to a variety of AI workloads. We also discuss techniques for hardware/software co-design that can intelligently map and adapt AI software to improve its performance on non-coherent optical computing platforms.
ARJul 11, 2024
OPIMA: Optical Processing-In-Memory for Convolutional Neural Network AccelerationFebin Sunny, Amin Shafiee, Abhishek Balasubramaniam et al.
Recent advances in machine learning (ML) have spotlighted the pressing need for computing architectures that bridge the gap between memory bandwidth and processing power. The advent of deep neural networks has pushed traditional Von Neumann architectures to their limits due to the high latency and energy consumption costs associated with data movement between the processor and memory for these workloads. One of the solutions to overcome this bottleneck is to perform computation within the main memory through processing-in-memory (PIM), thereby limiting data movement and the costs associated with it. However, DRAM-based PIM struggles to achieve high throughput and energy efficiency due to internal data movement bottlenecks and the need for frequent refresh operations. In this work, we introduce OPIMA, a PIM-based ML accelerator, architected within an optical main memory. OPIMA has been designed to leverage the inherent massive parallelism within main memory while performing high-speed, low-energy optical computation to accelerate ML models based on convolutional neural networks. We present a comprehensive analysis of OPIMA to guide design choices and operational mechanisms. Additionally, we evaluate the performance and energy consumption of OPIMA, comparing it with conventional electronic computing systems and emerging photonic PIM architectures. The experimental results show that OPIMA can achieve 2.98x higher throughput and 137x better energy efficiency than the best-known prior work.
ARMar 7, 2024
Silicon Photonic 2.5D Interposer Networks for Overcoming Communication Bottlenecks in Scale-out Machine Learning Hardware AcceleratorsFebin Sunny, Ebadollah Taheri, Mahdi Nikdast et al.
Modern machine learning (ML) applications are becoming increasingly complex and monolithic (single chip) accelerator architectures cannot keep up with their energy efficiency and throughput demands. Even though modern digital electronic accelerators are gradually adopting 2.5D architectures with multiple smaller chiplets to improve scalability, they face fundamental limitations due to a reliance on slow metallic interconnects. This paper outlines how optical communication and computation can be leveraged in 2.5D platforms to realize energy-efficient and high throughput 2.5D ML accelerator architectures.
ARJan 12, 2024
Accelerating Neural Networks for Large Language Models and Graph Processing with Silicon PhotonicsSalma Afifi, Febin Sunny, Mahdi Nikdast et al.
In the rapidly evolving landscape of artificial intelligence, large language models (LLMs) and graph processing have emerged as transformative technologies for natural language processing (NLP), computer vision, and graph-structured data applications. However, the complex structures of these models pose challenges for acceleration on conventional electronic platforms. In this paper, we describe novel hardware accelerators based on silicon photonics to accelerate transformer neural networks that are used in LLMs and graph neural networks for graph data processing. Our analysis demonstrates that both hardware accelerators achieve at least 10.2x throughput improvement and 3.8x better energy efficiency over multiple state-of-the-art electronic hardware accelerators designed for LLMs and graph processing.
ETApr 6
Light-Bound Transformers: Hardware-Anchored Robustness for Silicon-Photonic Computer Vision SystemsXuming Chen, Deniz Najafi, Chengwei Zhou et al.
Deploying Vision Transformers (ViTs) on near-sensor analog accelerators demands training pipelines that are explicitly aligned with device-level noise and energy constraints. We introduce a compact framework for silicon-photonic execution of ViTs that integrates measured hardware noise, robust attention training, and an energy-aware processing flow. We first characterize bank-level noise in microring-resonator (MR) arrays, including fabrication variation, thermal drift, and amplitude noise, and convert these measurements into closed-form, activation-dependent variance proxies for attention logits and feed-forward activations. Using these proxies, we develop Chance-Constrained Training (CCT), which enforces variance-normalized logit margins to bound attention rank flips, and a noise-aware LayerNorm that stabilizes feature statistics without changing the optical schedule. These components yield a practical ``measure $\rightarrow$ model $\rightarrow$ train $\rightarrow$ run'' pipeline that optimizes accuracy under noise while respecting system energy limits. Hardware-in-the-loop experiments with MR photonic banks show that our approach restores near-clean accuracy under realistic noise budgets, with no in-situ learning or additional optical MACs.
ETDec 14, 2021
Pruning Coherent Integrated Photonic Neural Networks Using the Lottery Ticket HypothesisSanmitra Banerjee, Mahdi Nikdast, Sudeep Pasricha et al.
Singular-value-decomposition-based coherent integrated photonic neural networks (SC-IPNNs) have a large footprint, suffer from high static power consumption for training and inference, and cannot be pruned using conventional DNN pruning techniques. We leverage the lottery ticket hypothesis to propose the first hardware-aware pruning method for SC-IPNNs that alleviates these challenges by minimizing the number of weight parameters. We prune a multi-layer perceptron-based SC-IPNN and show that up to 89% of the phase angles, which correspond to weight parameters in SC-IPNNs, can be pruned with a negligible accuracy loss (smaller than 5%) while reducing the static power consumption by up to 86%.
ETDec 11, 2021
CHAMP: Coherent Hardware-Aware Magnitude Pruning of Integrated Photonic Neural NetworksSanmitra Banerjee, Mahdi Nikdast, Sudeep Pasricha et al.
We propose a novel hardware-aware magnitude pruning technique for coherent photonic neural networks. The proposed technique can prune 99.45% of network parameters and reduce the static power consumption by 98.23% with a negligible accuracy loss.
LGSep 9, 2021
SONIC: A Sparse Neural Network Inference Accelerator with Silicon Photonics for Energy-Efficient Deep LearningFebin Sunny, Mahdi Nikdast, Sudeep Pasricha
Sparse neural networks can greatly facilitate the deployment of neural networks on resource-constrained platforms as they offer compact model sizes while retaining inference accuracy. Because of the sparsity in parameter matrices, sparse neural networks can, in principle, be exploited in accelerator architectures for improved energy-efficiency and latency. However, to realize these improvements in practice, there is a need to explore sparsity-aware hardware-software co-design. In this paper, we propose a novel silicon photonics-based sparse neural network inference accelerator called SONIC. Our experimental analysis shows that SONIC can achieve up to 5.8x better performance-per-watt and 8.4x lower energy-per-bit than state-of-the-art sparse electronic neural network accelerators; and up to 13.8x better performance-per-watt and 27.6x lower energy-per-bit than the best known photonic neural network accelerators.
LGJul 12, 2021
ROBIN: A Robust Optical Binary Neural Network AcceleratorFebin P. Sunny, Asif Mirza, Mahdi Nikdast et al.
Domain specific neural network accelerators have garnered attention because of their improved energy efficiency and inference performance compared to CPUs and GPUs. Such accelerators are thus well suited for resource-constrained embedded systems. However, mapping sophisticated neural network models on these accelerators still entails significant energy and memory consumption, along with high inference time overhead. Binarized neural networks (BNNs), which utilize single-bit weights, represent an efficient way to implement and deploy neural network models on accelerators. In this paper, we present a novel optical-domain BNN accelerator, named ROBIN, which intelligently integrates heterogeneous microring resonator optical devices with complementary capabilities to efficiently implement the key functionalities in BNNs. We perform detailed fabrication-process variation analyses at the optical device level, explore efficient corrective tuning for these devices, and integrate circuit-level optimization to counter thermal variations. As a result, our proposed ROBIN architecture possesses the desirable traits of being robust, energy-efficient, low latency, and high throughput, when executing BNN models. Our analysis shows that ROBIN can outperform the best-known optical BNN accelerators and also many electronic accelerators. Specifically, our energy-efficient ROBIN design exhibits energy-per-bit values that are ~4x lower than electronic BNN accelerators and ~933x lower than a recently proposed photonic BNN accelerator, while a performance-efficient ROBIN design shows ~3x and ~25x better performance than electronic and photonic BNN accelerators, respectively.
LGFeb 13, 2021
CrossLight: A Cross-Layer Optimized Silicon Photonic Neural Network AcceleratorFebin Sunny, Asif Mirza, Mahdi Nikdast et al.
Domain-specific neural network accelerators have seen growing interest in recent years due to their improved energy efficiency and inference performance compared to CPUs and GPUs. In this paper, we propose a novel cross-layer optimized neural network accelerator called CrossLight that leverages silicon photonics. CrossLight includes device-level engineering for resilience to process variations and thermal crosstalk, circuit-level tuning enhancements for inference latency reduction, and architecture-level optimization to enable higher resolution, better energy-efficiency, and improved throughput. On average, CrossLight offers 9.5x lower energy-per-bit and 15.9x higher performance-per-watt at 16-bit resolution than state-of-the-art photonic deep learning accelerators.
ETDec 19, 2020
Modeling Silicon-Photonic Neural Networks under UncertaintiesSanmitra Banerjee, Mahdi Nikdast, Krishnendu Chakrabarty
Silicon-photonic neural networks (SPNNs) offer substantial improvements in computing speed and energy efficiency compared to their digital electronic counterparts. However, the energy efficiency and accuracy of SPNNs are highly impacted by uncertainties that arise from fabrication-process and thermal variations. In this paper, we present the first comprehensive and hierarchical study on the impact of random uncertainties on the classification accuracy of a Mach-Zehnder Interferometer (MZI)-based SPNN. We show that such impact can vary based on both the location and characteristics (e.g., tuned phase angles) of a non-ideal silicon-photonic device. Simulation results show that in an SPNN with two hidden layers and 1374 tunable-thermal-phase shifters, random uncertainties even in mature fabrication processes can lead to a catastrophic 70% accuracy loss.