K-TanH: Efficient TanH For Deep Learning
This addresses computational bottlenecks in deep learning for practitioners needing faster and more efficient activation functions, though it is incremental as it builds on existing approximations.
The paper tackles the problem of inefficient TanH activation functions in deep learning by proposing K-TanH, a hardware-efficient approximation using low-precision integer operations, which achieves over 5x speed-up over Intel SVML and state-of-the-art results in language translation training.
We propose K-TanH, a novel, highly accurate, hardware efficient approximation of popular activation function TanH for Deep Learning. K-TanH consists of parameterized low-precision integer operations, such as, shift and add/subtract (no floating point operation needed) where parameters are stored in very small look-up tables that can fit in CPU registers. K-TanH can work on various numerical formats, such as, Float32 and BFloat16. High quality approximations to other activation functions, e.g., Sigmoid, Swish and GELU, can be derived from K-TanH. Our AVX512 implementation of K-TanH demonstrates $>5\times$ speed up over Intel SVML, and it is consistently superior in efficiency over other approximations that use floating point arithmetic. Finally, we achieve state-of-the-art Bleu score and convergence results for training language translation model GNMT on WMT16 data sets with approximate TanH obtained via K-TanH on BFloat16 inputs.