CLAILGSep 30, 2024

Beyond Single Concept Vector: Modeling Concept Subspace in LLMs with Gaussian Distribution

arXiv:2410.00153v30.2021 citationsh-index: 10
AI Analysis55

This work addresses the robustness issue in concept probing for LLMs, which is important for interpretability and control in real-world applications, though it is incremental as it builds on existing linear probing methods.

The authors tackled the problem of robustly representing concepts in LLMs by proposing Gaussian Concept Subspace (GCS) to approximate concept subspaces instead of single vectors, demonstrating its effectiveness in faithfulness, plausibility, and emotion steering tasks while balancing performance and fluency.

Probing learned concepts in large language models (LLMs) is crucial for understanding how semantic knowledge is encoded internally. Training linear classifiers on probing tasks is a principle approach to denote the vector of a certain concept in the representation space. However, the single vector identified for a concept varies with both data and training, making it less robust and weakening its effectiveness in real-world applications. To address this challenge, we propose an approach to approximate the subspace representing a specific concept. Built on linear probing classifiers, we extend the concept vectors into Gaussian Concept Subspace (GCS). We demonstrate GCS's effectiveness through measuring its faithfulness and plausibility across multiple LLMs with different sizes and architectures. Additionally, we use representation intervention tasks to showcase its efficacy in real-world applications such as emotion steering. Experimental results indicate that GCS concept vectors have the potential to balance steering performance and maintaining the fluency in natural language generation tasks.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes