LGMar 19, 2025

Task-Specific Data Selection for Instruction Tuning via Monosemantic Neuronal Activations

arXiv:2503.15573v23 citationsh-index: 12
Originality Incremental advance
AI Analysis

This work addresses a critical bottleneck in instruction tuning for researchers and practitioners aiming to optimize LLMs for specific tasks, representing an incremental improvement over prior distribution alignment methods.

The paper tackles the challenge of selecting relevant data for instruction tuning to improve task-specific performance in large language models, and introduces a method using monosemantic neuronal activations that consistently outperforms existing baselines in stability and performance across multiple datasets and tasks.

Instruction tuning improves the ability of large language models (LLMs) to follow diverse human instructions, but achieving strong performance on specific target tasks remains challenging. A critical bottleneck is selecting the most relevant data to maximize task-specific performance. Existing data selection approaches include unstable influence-based methods and more stable distribution alignment methods, the latter of which critically rely on the underlying sample representation. In practice, most distribution alignment methods, from shallow features (e.g., BM25) to neural embeddings (e.g., BGE, LLM2Vec), may fail to capture how the model internally processes samples. To bridge this gap, we adopt a model-centric strategy in which each sample is represented by its neuronal activation pattern in the model, directly reflecting internal computation. However, directly using raw neuron activations leads to spurious similarity between unrelated samples due to neuron polysemanticity, where a single neuron may respond to multiple, unrelated concepts. To address this, we employ sparse autoencoders to disentangle polysemantic activations into sparse, monosemantic representations, and introduce a dedicated similarity metric for this space to better identify task-relevant data. Comprehensive experiments across multiple instruction datasets, models, tasks, and selection ratios show that our approach consistently outperforms existing data selection baselines in both stability and task-specific performance.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes