LG AI ARMar 13, 2022

FlexBlock: A Flexible DNN Training Accelerator with Multi-Mode Block Floating Point Support

Seock-Hwan Noh, Jahyun Koo, Seunghyun Lee, Jongse Park, Jaeha Kung

arXiv:2203.06673v111.128 citationsh-index: 20

Originality Incremental advance

AI Analysis

This addresses the problem of slow and energy-intensive DNN training for AI researchers and practitioners, offering a versatile hardware solution that is incremental over prior BFP accelerators.

The paper tackles the computational expense of DNN training by proposing FlexBlock, a flexible accelerator that supports multiple block floating point (BFP) precision modes, achieving 1.5-5.3x speedup and 2.4-7.0x energy efficiency gains with marginal accuracy loss compared to full-precision training.

Training deep neural networks (DNNs) is a computationally expensive job, which can take weeks or months even with high performance GPUs. As a remedy for this challenge, community has started exploring the use of more efficient data representations in the training process, e.g., block floating point (BFP). However, prior work on BFP-based DNN accelerators rely on a specific BFP representation making them less versatile. This paper builds upon an algorithmic observation that we can accelerate the training by leveraging multiple BFP precisions without compromising the finally achieved accuracy. Backed up by this algorithmic opportunity, we develop a flexible DNN training accelerator, dubbed FlexBlock, which supports three different BFP precision modes, possibly different among activation, weight, and gradient tensors. While several prior works proposed such multi-precision support for DNN accelerators, not only do they focus only on the inference, but also their core utilization is suboptimal at a fixed precision and specific layer types when the training is considered. Instead, FlexBlock is designed in such a way that high core utilization is achievable for i) various layer types, and ii) three BFP precisions by mapping data in a hierarchical manner to its compute units. We evaluate the effectiveness of FlexBlock architecture using well-known DNNs on CIFAR, ImageNet and WMT14 datasets. As a result, training in FlexBlock significantly improves the training speed by 1.5~5.3x and the energy efficiency by 2.4~7.0x on average compared to other training accelerators and incurs marginal accuracy loss compared to full-precision training.

View on arXiv PDF

Similar