LGFeb 12

Dopamine: Brain Modes, Not Brains

arXiv:2602.11726v1h-index: 6

Originality Incremental advance

AI Analysis

This addresses the interpretability challenge in PEFT for researchers, though it's an incremental approach with reduced expressivity limitations.

The paper tackles the problem of interpreting parameter-efficient fine-tuning (PEFT) methods by proposing Dopamine, an activation-space technique that adapts models by selecting and rescaling existing computations rather than modifying weights. As a proof of concept on MNIST variants, it improves rotated accuracy over frozen baselines with only a few hundred trainable parameters per layer, trading some accuracy for better interpretability compared to LoRA.

Parameter-efficient fine-tuning (PEFT) methods such as \lora{} adapt large pretrained models by adding small weight-space updates. While effective, weight deltas are hard to interpret mechanistically, and they do not directly expose \emph{which} internal computations are reused versus bypassed for a new task. We explore an alternative view inspired by neuromodulation: adaptation as a change in \emph{mode} -- selecting and rescaling existing computations -- rather than rewriting the underlying weights. We propose \methodname{}, a simple activation-space PEFT technique that freezes base weights and learns per-neuron \emph{thresholds} and \emph{gains}. During training, a smooth gate decides whether a neuron's activation participates; at inference the gate can be hardened to yield explicit conditional computation and neuron-level attributions. As a proof of concept, we study ``mode specialization'' on MNIST (0$^\circ$) versus rotated MNIST (45$^\circ$). We pretrain a small MLP on a 50/50 mixture (foundation), freeze its weights, and then specialize to the rotated mode using \methodname{}. Across seeds, \methodname{} improves rotated accuracy over the frozen baseline while using only a few hundred trainable parameters per layer, and exhibits partial activation sparsity (a minority of units strongly active). Compared to \lora{}, \methodname{} trades some accuracy for substantially fewer trainable parameters and a more interpretable ``which-neurons-fire'' mechanism. We discuss limitations, including reduced expressivity when the frozen base lacks features needed for the target mode.

View on arXiv PDF

Similar