Probing Large Language Models from A Human Behavioral Perspective
This work addresses the problem of interpreting LLM mechanisms for researchers in NLP, but it is incremental as it builds on existing probing methods with a new behavioral perspective.
The paper tackled the problem of understanding the internal mechanisms of Large Language Models (LLMs) by probing them from a human behavioral perspective, using eye-tracking measures to correlate with LLM values. The result showed that LLMs exhibit prediction patterns similar to humans but distinct from Shallow Language Models (SLMs), with correlation coefficients increasing in feed-forward networks and multi-head self-attention as layers escalate from the middle layers.
Large Language Models (LLMs) have emerged as dominant foundational models in modern NLP. However, the understanding of their prediction processes and internal mechanisms, such as feed-forward networks (FFN) and multi-head self-attention (MHSA), remains largely unexplored. In this work, we probe LLMs from a human behavioral perspective, correlating values from LLMs with eye-tracking measures, which are widely recognized as meaningful indicators of human reading patterns. Our findings reveal that LLMs exhibit a similar prediction pattern with humans but distinct from that of Shallow Language Models (SLMs). Moreover, with the escalation of LLM layers from the middle layers, the correlation coefficients also increase in FFN and MHSA, indicating that the logits within FFN increasingly encapsulate word semantics suitable for predicting tokens from the vocabulary.