CVARAug 9, 2021

Tensor Yard: One-Shot Algorithm of Hardware-Friendly Tensor-Train Decomposition for Convolutional Neural Networks

arXiv:2108.04029v12 citations
AI Analysis

This work addresses the need for efficient deep learning deployment on specific hardware platforms like NPUs, though it appears incremental as it builds on existing Tensor-Train decomposition methods.

The paper tackles the problem of accelerating convolutional neural networks for hardware efficiency by introducing a hardware-friendly Tensor-Train decomposition and a one-shot training algorithm called Tensor Yard, which optimizes decomposition order, resulting in a 14.6% speedup for ResNet-101 on Ascend 310 NPU devices with only a 0.5% drop in top-1 ImageNet accuracy.

Nowadays Deep Learning became widely used in many economic, technical and scientific areas of human interest. It is clear that efficiency of solutions based on Deep Neural Networks should consider not only quality metric for the target task, but also latency and constraints of target platform design should be taken into account. In this paper we present novel hardware-friendly Tensor-Train decomposition implementation for Convolutional Neural Networks together with Tensor Yard - one-shot training algorithm which optimizes an order of decomposition of network layers. These ideas allow to accelerate ResNet models on Ascend 310 NPU devices without significant loss of accuracy. For example we accelerate ResNet-101 by 14.6% with drop by 0.5 of top-1 ImageNet accuracy.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes