CL AI LGSep 30, 2024

Beyond Single Concept Vector: Modeling Concept Subspace in LLMs with Gaussian Distribution

Haiyan Zhao, Heng Zhao, Bo Shen, Ali Payani, Fan Yang, Mengnan Du

arXiv:2410.00153v313.223 citationsh-index: 10Has Code

Originality Incremental advance

AI Analysis

This work addresses the robustness issue in concept probing for LLMs, which is important for interpretability and control in real-world applications, though it is incremental as it builds on existing linear probing methods.

The authors tackled the problem of robustly representing concepts in LLMs by proposing Gaussian Concept Subspace (GCS) to approximate concept subspaces instead of single vectors, demonstrating its effectiveness in faithfulness, plausibility, and emotion steering tasks while balancing performance and fluency.

Probing learned concepts in large language models (LLMs) is crucial for understanding how semantic knowledge is encoded internally. Training linear classifiers on probing tasks is a principle approach to denote the vector of a certain concept in the representation space. However, the single vector identified for a concept varies with both data and training, making it less robust and weakening its effectiveness in real-world applications. To address this challenge, we propose an approach to approximate the subspace representing a specific concept. Built on linear probing classifiers, we extend the concept vectors into Gaussian Concept Subspace (GCS). We demonstrate GCS's effectiveness through measuring its faithfulness and plausibility across multiple LLMs with different sizes and architectures. Additionally, we use representation intervention tasks to showcase its efficacy in real-world applications such as emotion steering. Experimental results indicate that GCS concept vectors have the potential to balance steering performance and maintaining the fluency in natural language generation tasks.

View on arXiv PDF Code

Similar