Bitwidth-Specific Logarithmic Arithmetic for Future Hardware-Accelerated Training
This addresses the challenge of high computational costs for training deep learning models on future hardware accelerators, representing an incremental advancement in low-precision training methods.
This work tackles the problem of reducing computational costs in deep learning training by introducing a bitwidth-specific logarithmic arithmetic method for low-precision fixed-point training, achieving minimal accuracy degradation (e.g., 12-bit integer vs. 32-bit floating-point) and hardware improvements such as up to 32.5% area reduction and 53.5% energy reduction.
While advancements in quantization have significantly reduced the computational costs of inference in deep learning, training still predominantly relies on complex floating-point arithmetic. Low-precision fixed-point training presents a compelling alternative. This work introduces a novel enhancement in low-precision logarithmic fixed-point training, geared towards future hardware accelerator designs. We propose incorporating bitwidth in the design of approximations to arithmetic operations. To this end, we introduce a new hardware-friendly, piece-wise linear approximation for logarithmic addition. Using simulated annealing, we optimize this approximation at different precision levels. A C++ bit-true simulation demonstrates training of VGG-11 and VGG-16 models on CIFAR-100 and TinyImageNet, respectively, using 12-bit integer arithmetic with minimal accuracy degradation compared to 32-bit floating-point training. Our hardware study reveals up to 32.5% reduction in area and 53.5% reduction in energy consumption for the proposed LNS multiply-accumulate units compared to that of linear fixed-point equivalents.