On Pruning State-Space LLMs
This work addresses efficiency improvements for state-space LLMs, which is an incremental contribution to model optimization.
The paper investigates whether state-space LLMs can be pruned to reduce computation costs, finding that they are robust to some pruning methods like WANDA but degrade quickly with others.
Recent work proposed state-space models (SSMs) as an efficient alternative to transformer-based LLMs. Can these models be pruned to further reduce their computation costs? We adapt several pruning methods to the SSM structure, and apply them to four SSM-based LLMs across multiple tasks. We find that such models are quite robust to some pruning methods (e.g. WANDA), while using other methods lead to fast performance degradation.