Gokul Krishnan

h-index11

4papers

45citations

Novelty49%

AI Score23

Ranked #173,402 of 194,257 authors (top 89%)#601 in AR (top 94%)

4 Papers

1.2ARMay 15, 2022

COIN: Communication-Aware In-Memory Acceleration for Graph Convolutional Networks

Sumit K. Mandal, Gokul Krishnan, A. Alper Goksoy et al.

Graph convolutional networks (GCNs) have shown remarkable learning capabilities when processing graph-structured data found inherently in many application areas. GCNs distribute the outputs of neural networks embedded in each vertex over multiple iterations to take advantage of the relations captured by the underlying graphs. Consequently, they incur a significant amount of computation and irregular communication overheads, which call for GCN-specific hardware accelerators. To this end, this paper presents a communication-aware in-memory computing architecture (COIN) for GCN hardware acceleration. Besides accelerating the computation using custom compute elements (CE) and in-memory computing, COIN aims at minimizing the intra- and inter-CE communication in GCN operations to optimize the performance and energy efficiency. Experimental evaluations with widely used datasets show up to 105x improvement in energy consumption compared to state-of-the-art GCN accelerator.

3.3ARJul 6, 2021

Impact of On-Chip Interconnect on In-Memory Acceleration of Deep Neural Networks

Gokul Krishnan, Sumit K. Mandal, Chaitali Chakrabarti et al.

With the widespread use of Deep Neural Networks (DNNs), machine learning algorithms have evolved in two diverse directions -- one with ever-increasing connection density for better accuracy and the other with more compact sizing for energy efficiency. The increase in connection density increases on-chip data movement, which makes efficient on-chip communication a critical function of the DNN accelerator. The contribution of this work is threefold. First, we illustrate that the point-to-point (P2P)-based interconnect is incapable of handling a high volume of on-chip data movement for DNNs. Second, we evaluate P2P and network-on-chip (NoC) interconnect (with a regular topology such as a mesh) for SRAM- and ReRAM-based in-memory computing (IMC) architectures for a range of DNNs. This analysis shows the necessity for the optimal interconnect choice for an IMC DNN accelerator. Finally, we perform an experimental evaluation for different DNNs to empirically obtain the performance of the IMC architecture with both NoC-tree and NoC-mesh. We conclude that, at the tile level, NoC-tree is appropriate for compact DNNs employed at the edge, and NoC-mesh is necessary to accelerate DNNs with high connection density. Furthermore, we propose a technique to determine the optimal choice of interconnect for any given DNN. In this technique, we use analytical models of NoC to evaluate end-to-end communication latency of any given DNN. We demonstrate that the interconnect optimization in the IMC architecture results in up to 6$\times$ improvement in energy-delay-area product for VGG-19 inference compared to the state-of-the-art ReRAM-based IMC architectures.

1.8LGNov 11, 2019

Structural Pruning in Deep Neural Networks: A Small-World Approach

Gokul Krishnan, Xiaocong Du, Yu Cao

Deep Neural Networks (DNNs) are usually over-parameterized, causing excessive memory and interconnection cost on the hardware platform. Existing pruning approaches remove secondary parameters at the end of training to reduce the model size; but without exploiting the intrinsic network property, they still require the full interconnection to prepare the network. Inspired by the observation that brain networks follow the Small-World model, we propose a novel structural pruning scheme, which includes (1) hierarchically trimming the network into a Small-World model before training, (2) training the network for a given dataset, and (3) optimizing the network for accuracy. The new scheme effectively reduces both the model size and the interconnection needed before training, achieving a locally clustered and globally sparse model. We demonstrate our approach on LeNet-5 for MNIST and VGG-16 for CIFAR-10, decreasing the number of parameters to 2.3% and 9.02% of the baseline model, respectively.

2.2NEMay 28, 2019

Towards Efficient Neural Networks On-a-chip: Joint Hardware-Algorithm Approaches

Xiaocong Du, Gokul Krishnan, Abinash Mohanty et al.

Machine learning algorithms have made significant advances in many applications. However, their hardware implementation on the state-of-the-art platforms still faces several challenges and are limited by various factors, such as memory volume, memory bandwidth and interconnection overhead. The adoption of the crossbar architecture with emerging memory technology partially solves the problem but induces process variation and other concerns. In this paper, we will present novel solutions to two fundamental issues in crossbar implementation of Artificial Intelligence (AI) algorithms: device variation and insufficient interconnections. These solutions are inspired by the statistical properties of algorithms themselves, especially the redundancy in neural network nodes and connections. By Random Sparse Adaptation and pruning the connections following the Small-World model, we demonstrate robust and efficient performance on representative datasets such as MNIST and CIFAR-10. Moreover, we present Continuous Growth and Pruning algorithm for future learning and adaptation on hardware.