LGAug 18, 2023Code
CTP:A Causal Interpretable Model for Non-Communicable Disease Progression PredictionZhoujian Sun, Wenzhuo Zhang, Zhengxing Huang et al.
Non-communicable disease is the leading cause of death, emphasizing the need for accurate prediction of disease progression and informed clinical decision-making. Machine learning (ML) models have shown promise in this domain by capturing non-linear patterns within patient features. However, existing ML-based models cannot provide causal interpretable predictions and estimate treatment effects, limiting their decision-making perspective. In this study, we propose a novel model called causal trajectory prediction (CTP) to tackle the limitation. The CTP model combines trajectory prediction and causal discovery to enable accurate prediction of disease progression trajectories and uncover causal relationships between features. By incorporating a causal graph into the prediction process, CTP ensures that ancestor features are not influenced by the treatment of descendant features, thereby enhancing the interpretability of the model. By estimating the bounds of treatment effects, even in the presence of unmeasured confounders, the CTP provides valuable insights for clinical decision-making. We evaluate the performance of the CTP using simulated and real medical datasets. Experimental results demonstrate that our model achieves satisfactory performance, highlighting its potential to assist clinical decisions. Source code is in \href{https://github.com/DanielSun94/CFPA}{here}.
CLDec 8, 2022
The Neural Correlates of Linguistic Structure Building: Comments on Kazanina & Tavano (2022)Nai Ding
A recent perspective paper by Kazanina & Tavano (referred to as the KT perspective in the following) argues how neural oscillations cannot provide a potential neural correlate for syntactic structure building. The view that neural oscillations can provide a potential neural correlate for syntactic structure building is largely attributed to a study by Ding, Melloni, Zhang, Tian, and Poeppel in 2016 (referred to as the DMZTP study). The KT perspective is thought provoking, but has severe misinterpretations about the arguments in DMZTP and other studies, and contains contradictory conclusions in different parts of the perspective, making it impossible to understand the position of the authors. In the following, I summarize a few misinterpretations and inconsistent arguments in the KT perspective, and put forward a few suggestions for future studies.
CLJan 12, 2023
Language Cognition and Language Computation -- Human and Machine Language UnderstandingShaonan Wang, Nai Ding, Nan Lin et al.
Language understanding is a key scientific issue in the fields of cognitive and computer science. However, the two disciplines differ substantially in the specific research questions. Cognitive science focuses on analyzing the specific mechanism of the brain and investigating the brain's response to language; few studies have examined the brain's language system as a whole. By contrast, computer scientists focus on the efficiency of practical applications when choosing research questions but may ignore the most essential laws of language. Given these differences, can a combination of the disciplines offer new insights for building intelligent language models and studying language cognitive mechanisms? In the following text, we first review the research questions, history, and methods of language understanding in cognitive and computer science, focusing on the current progress and challenges. We then compare and contrast the research of language understanding in cognitive and computer sciences. Finally, we review existing work that combines insights from language cognition and language computation and offer prospects for future development trends.
CLOct 17, 2023
Probing the Creativity of Large Language Models: Can models produce divergent semantic association?Honghua Chen, Nai Ding
Large language models possess remarkable capacity for processing language, but it remains unclear whether these models can further generate creative content. The present study aims to investigate the creative thinking of large language models through a cognitive perspective. We utilize the divergent association task (DAT), an objective measurement of creativity that asks models to generate unrelated words and calculates the semantic distance between them. We compare the results across different models and decoding strategies. Our findings indicate that: (1) When using the greedy search strategy, GPT-4 outperforms 96% of humans, while GPT-3.5-turbo exceeds the average human level. (2) Stochastic sampling and temperature scaling are effective to obtain higher DAT scores for models except GPT-4, but face a trade-off between creativity and stability. These results imply that advanced large language models have divergent semantic associations, which is a fundamental process underlying creativity.
CLOct 15, 2025Code
Hierarchical Frequency Tagging Probe (HFTP): A Unified Approach to Investigate Syntactic Structure Representations in Large Language Models and the Human BrainJingmin An, Yilong Song, Ruolin Yang et al.
Large Language Models (LLMs) demonstrate human-level or even superior language abilities, effectively modeling syntactic structures, yet the specific computational modules responsible remain unclear. A key question is whether LLM behavioral capabilities stem from mechanisms akin to those in the human brain. To address these questions, we introduce the Hierarchical Frequency Tagging Probe (HFTP), a tool that utilizes frequency-domain analysis to identify neuron-wise components of LLMs (e.g., individual Multilayer Perceptron (MLP) neurons) and cortical regions (via intracranial recordings) encoding syntactic structures. Our results show that models such as GPT-2, Gemma, Gemma 2, Llama 2, Llama 3.1, and GLM-4 process syntax in analogous layers, while the human brain relies on distinct cortical regions for different syntactic levels. Representational similarity analysis reveals a stronger alignment between LLM representations and the left hemisphere of the brain (dominant in language processing). Notably, upgraded models exhibit divergent trends: Gemma 2 shows greater brain similarity than Gemma, while Llama 3.1 shows less alignment with the brain compared to Llama 2. These findings offer new insights into the interpretability of LLM behavioral improvements, raising questions about whether these advancements are driven by human-like or non-human-like mechanisms, and establish HFTP as a valuable tool bridging computational linguistics and cognitive neuroscience. This project is available at https://github.com/LilTiger/HFTP.
CLJan 23
Gated Tree Cross-attention for Checkpoint-Compatible Syntax Injection in Decoder-Only LLMsXinyu Gao, Shaonan Wang, Nai Ding
Decoder-only large language models achieve strong broad performance but are brittle to minor grammatical perturbations, undermining reliability for downstream reasoning. However, directly injecting explicit syntactic structure into an existing checkpoint can interfere with its pretrained competence. We introduce a checkpoint-compatible gated tree cross-attention (GTCA) branch that reads precomputed constituency chunk memory while leaving backbone architecture unchanged. Our design uses a token update mask and staged training to control the scope and timing of structural updates. Across benchmarks and Transformer backbones, GTCA strengthens syntactic robustness beyond continued-training baselines without compromising Multiple-Choice QA performance or commonsense reasoning, providing a practical checkpoint-compatible route to more syntax-robust decoder-only LLMs.
CVJan 12
Evaluating the encoding competence of visual language models using uncommon actionsChen Ling, Nai Ding
We propose UAIT (Uncommon-sense Action Image-Text) dataset, a new evaluation benchmark designed to test the semantic understanding ability of visual language models (VLMs) in uncommon-sense action scenes. Unlike previous datasets that focus on common visual scenes with statistical frequency advantages, UAIT challenges models with grammatically reasonable but semantically counter-common sense image-text pairs. Such tasks require models to go beyond superficial pattern recognition and demonstrate a deep understanding of agent-patient relationships and physical feasibility. To build UAIT, we designed a semi-automated process to synthesize high-quality uncommon-sense image-text samples using large language models, few-shot prompt engineering, and text-to-image generation. Each sample is accompanied by a carefully designed multiple-choice question to test the model's competence in fine-grained reasoning. We evaluate multiple state-of-the-art visual language models and compare them with models based on contrastive learning. Experiments show that all models perform significantly worse than humans in semantic judgment, especially in distinguishing grammatical correctness from semantic rationality. Further experiments show that even the lightweight model can improve its accuracy after fine-tuning, demonstrating the great potential of directional adaptation. This study not only reveals the key weaknesses of VLMs, but also provides diagnostic tools and research directions for the development of robust models with real visual semantic reasoning capabilities.
CVSep 3, 2023
Holistic Dynamic Frequency Transformer for Image Fusion and Exposure CorrectionXiaoke Shang, Gehui Li, Zhiying Jiang et al.
The correction of exposure-related issues is a pivotal component in enhancing the quality of images, offering substantial implications for various computer vision tasks. Historically, most methodologies have predominantly utilized spatial domain recovery, offering limited consideration to the potentialities of the frequency domain. Additionally, there has been a lack of a unified perspective towards low-light enhancement, exposure correction, and multi-exposure fusion, complicating and impeding the optimization of image processing. In response to these challenges, this paper proposes a novel methodology that leverages the frequency domain to improve and unify the handling of exposure correction tasks. Our method introduces Holistic Frequency Attention and Dynamic Frequency Feed-Forward Network, which replace conventional correlation computation in the spatial-domain. They form a foundational building block that facilitates a U-shaped Holistic Dynamic Frequency Transformer as a filter to extract global information and dynamically select important frequency bands for image restoration. Complementing this, we employ a Laplacian pyramid to decompose images into distinct frequency bands, followed by multiple restorers, each tuned to recover specific frequency-band information. The pyramid fusion allows a more detailed and nuanced image restoration process. Ultimately, our structure unifies the three tasks of low-light enhancement, exposure correction, and multi-exposure fusion, enabling comprehensive treatment of all classical exposure errors. Benchmarking on mainstream datasets for these tasks, our proposed method achieves state-of-the-art results, paving the way for more sophisticated and unified solutions in exposure correction.
CLMay 6, 2023
Replicating Complex Dialogue Policy of Humans via Offline Imitation Learning with Supervised RegularizationZhoujian Sun, Chenyang Zhao, Zhengxing Huang et al.
Policy learning (PL) is a module of a task-oriented dialogue system that trains an agent to make actions in each dialogue turn. Imitating human action is a fundamental problem of PL. However, both supervised learning (SL) and reinforcement learning (RL) frameworks cannot imitate humans well. Training RL models require online interactions with user simulators, while simulating complex human policy is hard. Performances of SL-based models are restricted because of the covariate shift problem. Specifically, a dialogue is a sequential decision-making process where slight differences in current utterances and actions will cause significant differences in subsequent utterances. Therefore, the generalize ability of SL models is restricted because statistical characteristics of training and testing dialogue data gradually become different. This study proposed an offline imitation learning model that learns policy from real dialogue datasets and does not require user simulators. It also utilizes state transition information, which alleviates the influence of the covariate shift problem. We introduced a regularization trick to make our model can be effectively optimized. We investigated the performance of our model on four independent public dialogue datasets. The experimental result showed that our model performed better in the action prediction task.
CLFeb 15, 2022
On Tracking Dialogue State by Inheriting Slot Values in Mentioned Slot PoolsZhoujian Sun, Zhengxing Huang, Nai Ding
Dialogue state tracking (DST) is a component of the task-oriented dialogue system. It is responsible for extracting and managing slot values according to dialogue utterances, where each slot represents an essential part of the information to accomplish a task, and slot value is updated recurrently in each dialogue turn. However, many DST models cannot update slot values appropriately. These models may repeatedly inherit wrong slot values extracted in previous turns, resulting in the fail of the entire DST task. They cannot update indirectly mentioned slots well, either. This study designed a model with a mentioned slot pool (MSP) to tackle the update problem. The MSP is a slot-specific memory that records all mentioned slot values that may be inherited, and our model updates slot values according to the MSP and the dialogue context. Our model rejects inheriting the previous slot value when it predicates the value is wrong. Then, it re-extracts the slot value from the current dialogue context. As the contextual information accumulates with the dialogue progress, the new value is more likely to be correct. It also can track the indirectly mentioned slot by picking a value from the MSP. Experimental results showed our model reached state-of-the-art DST performance on MultiWOZ 2.1 and 2.2 datasets.
CLJul 13, 2021
Human Attention during Goal-directed Reading Comprehension Relies on Task OptimizationJiajie Zou, Yuran Zhang, Jialu Li et al.
The computational principles underlying attention allocation in complex goal-directed tasks remain elusive. Goal-directed reading, i.e., reading a passage to answer a question in mind, is a common real-world task that strongly engages attention. Here, we investigate what computational models can explain attention distribution in this complex task. We show that the reading time on each word is predicted by the attention weights in transformer-based deep neural networks (DNNs) optimized to perform the same reading task. Eye-tracking further reveals that readers separately attend to basic text features and question-relevant information during first-pass reading and rereading, respectively. Similarly, text features and question relevance separately modulate attention weights in shallow and deep DNN layers. Furthermore, when readers scan a passage without a question in mind, their reading time is predicted by DNNs optimized for a word prediction task. Therefore, attention during real-world reading can be interpreted as the consequence of task optimization.
CLJun 23, 2021
PALRACE: Reading Comprehension Dataset with Human Data and Labeled RationalesJiajie Zou, Yuran Zhang, Peiqing Jin et al.
Pre-trained language models achieves high performance on machine reading comprehension (MRC) tasks but the results are hard to explain. An appealing approach to make models explainable is to provide rationales for its decision. To investigate whether human rationales can further improve current models and to facilitate supervised learning of human rationales, here we present PALRACE (Pruned And Labeled RACE), a new MRC dataset with human labeled rationales for 800 passages selected from the RACE dataset. We further classified the question to each passage into 6 types. Each passage was read by at least 26 human readers, who labeled their rationales to answer the question. It is demonstrated that models such as RoBERTa-large outperforms human readers in all 6 types of questions, including inference questions, but its performance can be further improved when having access to the human rationales. Simpler models and pre-trained models that are not fine-tuned based on the task benefit more from human rationales, and their performance can be boosted by more than 30% by rationales. With access to human rationales, a simple model based on the GloVe word embedding can reach the performance of BERT-base.
CLMay 24, 2021
Using Adversarial Attacks to Reveal the Statistical Bias in Machine Reading Comprehension ModelsJieyu Lin, Jiajie Zou, Nai Ding
Pre-trained language models have achieved human-level performance on many Machine Reading Comprehension (MRC) tasks, but it remains unclear whether these models truly understand language or answer questions by exploiting statistical biases in datasets. Here, we demonstrate a simple yet effective method to attack MRC models and reveal the statistical biases in these models. We apply the method to the RACE dataset, for which the answer to each MRC question is selected from 4 options. It is found that several pre-trained language models, including BERT, ALBERT, and RoBERTa, show consistent preference to some options, even when these options are irrelevant to the question. When interfered by these irrelevant options, the performance of MRC models can be reduced from human-level performance to the chance-level performance. Human readers, however, are not clearly affected by these irrelevant options. Finally, we propose an augmented training method that can greatly reduce models' statistical biases.