LG CLMar 2, 2024

Dissecting Language Models: Machine Unlearning via Selective Pruning

arXiv:2403.01267v226.044 citationsh-index: 7Has Code

Originality Incremental advance

AI Analysis

This addresses the need for efficient control over LLM behaviors, but it is incremental as it builds on existing pruning techniques for unlearning.

The paper tackles the problem of shaping Large Language Models (LLMs) by introducing a selective pruning method for machine unlearning, which removes neurons based on their importance for targeted capabilities, revealing that both feed-forward and attention neurons are specialized for specific tasks.

Understanding and shaping the behaviour of Large Language Models (LLMs) is increasingly important as applications become more powerful and more frequently adopted. This paper introduces a machine unlearning method specifically designed for LLMs. We introduce a selective pruning method for LLMs that removes neurons based on their relative importance on a targeted capability compared to overall network performance. This approach is a compute- and data-efficient method for identifying and removing neurons that enable specific behaviours. Our findings reveal that both feed-forward and attention neurons in LLMs are specialized; that is, for specific tasks, certain neurons are more crucial than others. Code from all experiments is available at https://github.com/nickypro/selective-pruning

View on arXiv PDF Code

Similar