Extracting Conceptual Spaces from LLMs Using Prototype Embeddings
This addresses the lack of practical methods for extracting conceptual spaces from LLMs, which is important for explainable AI and cognitive science applications, though it appears incremental.
The paper tackles the problem of extracting conceptual spaces from LLMs by proposing a method that encodes features using prototype embeddings and fine-tuning for alignment, achieving high effectiveness in empirical results.
Conceptual spaces represent entities and concepts using cognitively meaningful dimensions, typically referring to perceptual features. Such representations are widely used in cognitive science and have the potential to serve as a cornerstone for explainable AI. Unfortunately, they have proven notoriously difficult to learn, although recent LLMs appear to capture the required perceptual features to a remarkable extent. Nonetheless, practical methods for extracting the corresponding conceptual spaces are currently still lacking. While various methods exist for extracting embeddings from LLMs, extracting conceptual spaces also requires us to encode the underlying features. In this paper, we propose a strategy in which features (e.g. sweetness) are encoded by embedding the description of a corresponding prototype (e.g. a very sweet food). To improve this strategy, we fine-tune the LLM to align the prototype embeddings with the corresponding conceptual space dimensions. Our empirical analysis finds this approach to be highly effective.