LGAINEJul 17, 2024

Mamba-PTQ: Outlier Channels in Recurrent Large Language Models

arXiv:2407.12397v115 citationsh-index: 5
Originality Synthesis-oriented
AI Analysis

This work addresses the problem of efficient model compression for recurrent LLMs in resource-limited environments, but it is incremental as it identifies a known issue (outlier channels) in a new context without achieving significant performance gains.

The paper tackled the challenge of quantizing recurrent large language models (LLMs) like Mamba for edge deployment, showing that these models exhibit activation outlier channels similar to attention-based LLMs, which hinder quantization, and provided baseline results without outlier handling while suggesting initial steps for outlier-aware quantization.

Modern recurrent layers are emerging as a promising path toward edge deployment of foundation models, especially in the context of large language models (LLMs). Compressing the whole input sequence in a finite-dimensional representation enables recurrent layers to model long-range dependencies while maintaining a constant inference cost for each token and a fixed memory requirement. However, the practical deployment of LLMs in resource-limited environments often requires further model compression, such as quantization and pruning. While these techniques are well-established for attention-based models, their effects on recurrent layers remain underexplored. In this preliminary work, we focus on post-training quantization for recurrent LLMs and show that Mamba models exhibit the same pattern of outlier channels observed in attention-based LLMs. We show that the reason for the difficulty of quantizing SSMs is caused by activation outliers, similar to those observed in transformer-based LLMs. We report baseline results for post-training quantization of Mamba that do not take into account the activation outliers and suggest first steps for outlier-aware quantization.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes