Zhengqi He

3papers

1citation

Novelty53%

AI Score23

Ranked #182,245 of 201,326 authors (top 91%)#30,291 in CL (top 93%)

3 Papers

CLOct 13, 2022

Spontaneous Emerging Preference in Two-tower Language Model

Zhengqi He, Taro Toyoizumi

The ever-growing size of the foundation language model has brought significant performance gains in various types of downstream tasks. With the existence of side-effects brought about by the large size of the foundation language model such as deployment cost, availability issues, and environmental cost, there is some interest in exploring other possible directions, such as a divide-and-conquer scheme. In this paper, we are asking a basic question: are language processes naturally dividable? We study this problem with a simple two-tower language model setting, where two language models with identical configurations are trained side-by-side cooperatively. With this setting, we discover the spontaneous emerging preference phenomenon, where some of the tokens are consistently better predicted by one tower while others by another tower. This phenomenon is qualitatively stable, regardless of model configuration and type, suggesting this as an intrinsic property of natural language. This study suggests that interesting properties of natural language are still waiting to be discovered, which may aid the future development of natural language processing techniques.

CLNov 17, 2023

Causal Graph in Language Model Rediscovers Cortical Hierarchy in Human Narrative Processing

Zhengqi He, Taro Toyoizumi

Understanding how humans process natural language has long been a vital research direction. The field of natural language processing (NLP) has recently experienced a surge in the development of powerful language models. These models have proven to be invaluable tools for studying another complex system known to process human language: the brain. Previous studies have demonstrated that the features of language models can be mapped to fMRI brain activity. This raises the question: is there a commonality between information processing in language models and the human brain? To estimate information flow patterns in a language model, we examined the causal relationships between different layers. Drawing inspiration from the workspace framework for consciousness, we hypothesized that features integrating more information would more accurately predict higher hierarchical brain activity. To validate this hypothesis, we classified language model features into two categories based on causal network measures: 'low in-degree' and 'high in-degree'. We subsequently compared the brain prediction accuracy maps for these two groups. Our results reveal that the difference in prediction accuracy follows a hierarchical pattern, consistent with the cortical hierarchy map revealed by activity time constants. This finding suggests a parallel between how language models and the human brain process linguistic information.

AIJan 8, 2021

Progressive Interpretation Synthesis: Interpreting Task Solving by Quantifying Previously Used and Unused Information

Zhengqi He, Taro Toyoizumi

A deep neural network is a good task solver, but it is difficult to make sense of its operation. People have different ideas about how to form the interpretation about its operation. We look at this problem from a new perspective where the interpretation of task solving is synthesized by quantifying how much and what previously unused information is exploited in addition to the information used to solve previous tasks. First, after learning several tasks, the network acquires several information partitions related to each task. We propose that the network, then, learns the minimal information partition that supplements previously learned information partitions to more accurately represent the input. This extra partition is associated with un-conceptualized information that has not been used in previous tasks. We manage to identify what un-conceptualized information is used and quantify the amount. To interpret how the network solves a new task, we quantify as meta-information how much information from each partition is extracted. We implement this framework with the variational information bottleneck technique. We test the framework with the MNIST and the CLEVR dataset. The framework is shown to be able to compose information partitions and synthesize experience-dependent interpretation in the form of meta-information. This system progressively improves the resolution of interpretation upon new experience by converting a part of the un-conceptualized information partition to a task-related partition. It can also provide a visual interpretation by imaging what is the part of previously un-conceptualized information that is needed to solve a new task.