DCLGNEApr 20, 2020

Agile Autotuning of a Transprecision Tensor Accelerator Overlay for TVM Compiler Stack

arXiv:2004.10854v19 citations
AI Analysis

This work addresses the engineering cost problem for developers using FPGA-based tensor accelerators in deep learning, though it appears incremental as it builds on existing overlay and auto-tuning concepts.

The paper tackles the challenge of adapting tensor accelerators to evolving deep learning frameworks and precision options by proposing a programmable overlay (τ-VTA) and an agile auto-tuning method, achieving higher performance and faster convergence than state-of-the-art approaches.

Specialized accelerators for tensor-operations, such as blocked-matrix operations and multi-dimensional convolutions, have been emerged as powerful architecture choices for high-performance Deep-Learning computing. The rapid development of frameworks, models, and precision options challenges the adaptability of such tensor-accelerators since the adaptation to new requirements incurs significant engineering costs. Programmable tensor accelerators offer a promising alternative by allowing reconfiguration of a virtual architecture that overlays on top of the physical FPGA configurable fabric. We propose an overlay (τ-VTA) and an optimization method guided by agile-inspired auto-tuning techniques. We achieve higher performance and faster convergence than state-of-art.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes