CLLGJul 12, 2025

Detecting and Pruning Prominent but Detrimental Neurons in Large Language Models

arXiv:2507.09185v12 citationsh-index: 7
Originality Incremental advance
AI Analysis

This addresses the issue of poor generalization in LLMs for users needing robust performance across tasks, but it is incremental as it builds on existing pruning and fine-tuning techniques.

The authors tackled the problem of dataset-specific mechanisms in large language models degrading generalization, and their result was that pruning neurons associated with these mechanisms via a fine-tuning approach significantly enhanced performance on multiple-choice benchmarks, surpassing prior adaptation methods.

Large language models (LLMs) often develop learned mechanisms specialized to specific datasets, such as reliance on domain-specific correlations, which yield high-confidence predictions without generalizable reasoning. While beneficial in one setting, these dataset-specific mechanisms typically degrade performance when models encounter novel tasks or distributions. In this work, we introduce a fine-tuning approach designed to enhance generalization by identifying and pruning neurons associated with dataset-specific mechanisms in transformer-based LLMs. Our method employs Integrated Gradients to quantify each neuron's influence on high-confidence predictions, pinpointing those that disproportionately contribute to dataset-specific performance without supporting robust, transferable reasoning. Selectively pruning these neurons compels the model to depend on generalizable representations. Evaluated across multiple-choice benchmarks, our pruning-based fine-tuning significantly enhances performance, surpassing prior (non-pruning) adaptation methods.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes