CLNov 13, 2023

Finding and Editing Multi-Modal Neurons in Pre-Trained Transformers

arXiv:2311.07470v247 citationsh-index: 23
Originality Incremental advance
AI Analysis

This work addresses interpretability challenges in multi-modal LLMs for researchers and practitioners, offering incremental improvements in efficiency and editing capabilities.

The paper tackles the problem of understanding how multi-modal large language models integrate visual and textual concepts by proposing a novel method to identify key neurons without costly gradient computation, and further designs a multi-modal knowledge editing method to mitigate issues like sensitive words or hallucination, with results validating effectiveness and highlighting key neuron properties.

Understanding the internal mechanisms by which multi-modal large language models (LLMs) interpret different modalities and integrate cross-modal representations is becoming increasingly critical for continuous improvements in both academia and industry. In this paper, we propose a novel method to identify key neurons for interpretability -- how multi-modal LLMs bridge visual and textual concepts for captioning. Our method improves conventional works upon efficiency and applied range by removing needs of costly gradient computation. Based on those identified neurons, we further design a multi-modal knowledge editing method, beneficial to mitigate sensitive words or hallucination. For rationale of our design, we provide theoretical assumption. For empirical evaluation, we have conducted extensive quantitative and qualitative experiments. The results not only validate the effectiveness of our methods, but also offer insightful findings that highlight three key properties of multi-modal neurons: sensitivity, specificity and causal-effect, to shed light for future research.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes