LG CL CVMar 12, 2024

IM-Unpack: Training and Inference with Arbitrarily Low Precision Integers

Zhanpeng Zeng, Karthikeyan Sankaralingam, Vikas Singh

arXiv:2403.07339v14.61 citationsh-index: 38ICML

Originality Incremental advance

AI Analysis

This addresses efficiency improvements in deep learning for practitioners by enabling low-precision integer use without sophisticated error control, though it is incremental as it builds on existing low-bit strategies.

The paper tackles the challenge of using low bit-width integers for all GEMM operations in Transformer-based models to improve efficiency, and finds that while most entries are representable, a few large entries hinder this. They develop IM-Unpack, a simple algorithm that unpacks matrices to allow exact results with arbitrarily low bit-width integers, showing small overhead for many models.

GEneral Matrix Multiply (GEMM) is a central operation in deep learning and corresponds to the largest chunk of the compute footprint. Therefore, improving its efficiency is an active topic of ongoing research. A popular strategy is the use of low bit-width integers to approximate the original entries in a matrix. This allows efficiency gains, but often requires sophisticated techniques to control the rounding error incurred. In this work, we first verify/check that when the low bit-width restriction is removed, for a variety of Transformer-based models, whether integers are sufficient for all GEMMs need -- for {\em both} training and inference stages, and can achieve parity with floating point counterparts. No sophisticated techniques are needed. We find that while a large majority of entries in matrices (encountered in such models) can be easily represented by {\em low} bit-width integers, the existence of a few heavy hitter entries make it difficult to achieve efficiency gains via the exclusive use of low bit-width GEMMs alone. To address this issue, we develop a simple algorithm, Integer Matrix Unpacking (IM-Unpack), to {\em unpack} a matrix with large integer entries into a larger matrix whose entries all lie within the representable range of arbitrarily low bit-width integers. This allows {\em equivalence} with the original GEMM, i.e., the exact result can be obtained using purely low bit-width integer GEMMs. This comes at the cost of additional operations -- we show that for many popular models, this overhead is quite small.

View on arXiv PDF

Similar