Unveiling LLMs: The Evolution of Latent Representations in a Dynamic Knowledge Graph
This work addresses the interpretability of LLMs for researchers, providing insights into internal mechanisms, but it is incremental as it builds on existing activation patching techniques without introducing new paradigms.
The paper tackled the problem of understanding how Large Language Models (LLMs) internally represent factual knowledge for sentence-level claim verification, by developing a framework that decodes token representations into a dynamic knowledge graph, revealing layer-wise evolution and entity centrality in reasoning.
Large Language Models (LLMs) demonstrate an impressive capacity to recall a vast range of factual knowledge. However, understanding their underlying reasoning and internal mechanisms in exploiting this knowledge remains a key research area. This work unveils the factual information an LLM represents internally for sentence-level claim verification. We propose an end-to-end framework to decode factual knowledge embedded in token representations from a vector space to a set of ground predicates, showing its layer-wise evolution using a dynamic knowledge graph. Our framework employs activation patching, a vector-level technique that alters a token representation during inference, to extract encoded knowledge. Accordingly, we neither rely on training nor external models. Using factual and common-sense claims from two claim verification datasets, we showcase interpretability analyses at local and global levels. The local analysis highlights entity centrality in LLM reasoning, from claim-related information and multi-hop reasoning to representation errors causing erroneous evaluation. On the other hand, the global reveals trends in the underlying evolution, such as word-based knowledge evolving into claim-related facts. By interpreting semantics from LLM latent representations and enabling graph-related analyses, this work enhances the understanding of the factual knowledge resolution process.