LG AI ARFeb 16, 2023

With Shared Microexponents, A Little Shifting Goes a Long Way

Bita Rouhani, Ritchie Zhao, Venmugil Elango, Rasoul Shafipour, Mathew Hall, Maral Mesmakhosroshahi, Ankit More, Levi Melnick, Maximilian Golub, Girish Varatkar, Lei Shao, Gaurav Kolhe

arXiv:2302.08007v225.780 citationsh-index: 57

Originality Incremental advance

AI Analysis

This addresses the need for improved quantization methods to enhance computational efficiency in deep learning applications, though it appears incremental as it builds on existing quantization standards.

The paper tackles the problem of efficient narrow-precision formats for deep learning by introducing Block Data Representations (BDR) and shared microexponents (MX), which outperform state-of-the-art quantization approaches in real-world models like generative pretraining and recommendation systems.

This paper introduces Block Data Representations (BDR), a framework for exploring and evaluating a wide spectrum of narrow-precision formats for deep learning. It enables comparison of popular quantization standards, and through BDR, new formats based on shared microexponents (MX) are identified, which outperform other state-of-the-art quantization approaches, including narrow-precision floating-point and block floating-point. MX utilizes multiple levels of quantization scaling with ultra-fine scaling factors based on shared microexponents in the hardware. The effectiveness of MX is demonstrated on real-world models including large-scale generative pretraining and inferencing, and production-scale recommendation systems.

View on arXiv PDF

Similar