CLNov 6, 2022
Improved Target-specific Stance Detection on Social Media Platforms by Delving into Conversation ThreadsYupeng Li, Haorui He, Shaonan Wang et al.
Target-specific stance detection on social media, which aims at classifying a textual data instance such as a post or a comment into a stance class of a target issue, has become an emerging opinion mining paradigm of importance. An example application would be to overcome vaccine hesitancy in combating the coronavirus pandemic. However, existing stance detection strategies rely merely on the individual instances which cannot always capture the expressed stance of a given target. In response, we address a new task called conversational stance detection which is to infer the stance towards a given target (e.g., COVID-19 vaccination) when given a data instance and its corresponding conversation thread. To tackle the task, we first propose a benchmarking conversational stance detection (CSD) dataset with annotations of stances and the structures of conversation threads among the instances based on six major social media platforms in Hong Kong. To infer the desired stances from both data instances and conversation threads, we propose a model called Branch-BERT that incorporates contextual information in conversation threads. Extensive experiments on our CSD dataset show that our proposed model outperforms all the baseline models that do not make use of contextual information. Specifically, it improves the F1 score by 10.3% compared with the state-of-the-art method in the SemEval-2016 Task 6 competition. This shows the potential of incorporating rich contextual information on detecting target-specific stances on social media platforms and implies a more practical way to construct future stance detection tasks.
CLNov 14, 2023
Improving In-context Learning of Multilingual Generative Language Models with Cross-lingual AlignmentChong Li, Shaonan Wang, Jiajun Zhang et al.
Multilingual generative models obtain remarkable cross-lingual in-context learning capabilities through pre-training on large-scale corpora. However, they still exhibit a performance bias toward high-resource languages and learn isolated distributions of multilingual sentence representations, which may hinder knowledge transfer across languages. To bridge this gap, we propose a simple yet effective cross-lingual alignment framework exploiting pairs of translation sentences. It aligns the internal sentence representations across different languages via multilingual contrastive learning and aligns outputs by following cross-lingual instructions in the target language. Experimental results show that even with less than 0.1 {\textperthousand} of pre-training tokens, our alignment framework significantly boosts the cross-lingual abilities of generative language models and mitigates the performance gap. Further analyses reveal that it results in a better internal multilingual representation distribution of multilingual models.
MTRL-SCIFeb 9, 2023
Flexible, Model-Agnostic Method for Materials Data Extraction from Text Using General Purpose Language ModelsMaciej P. Polak, Shrey Modi, Anna Latosinska et al.
Accurate and comprehensive material databases extracted from research papers are crucial for materials science and engineering, but their development requires significant human effort. With large language models (LLMs) transforming the way humans interact with text, LLMs provide an opportunity to revolutionize data extraction. In this study, we demonstrate a simple and efficient method for extracting materials data from full-text research papers leveraging the capabilities of LLMs combined with human supervision. This approach is particularly suitable for mid-sized databases and requires minimal to no coding or prior knowledge about the extracted property. It offers high recall and nearly perfect precision in the resulting database. The method is easily adaptable to new and superior language models, ensuring continued utility. We show this by evaluating and comparing its performance on GPT-3 and GPT-3.5/4 (which underlie ChatGPT), as well as free alternatives such as BART and DeBERTaV3. We provide a detailed analysis of the method's performance in extracting sentences containing bulk modulus data, achieving up to 90% precision at 96% recall, depending on the amount of human effort involved. We further demonstrate the method's broader effectiveness by developing a database of critical cooling rates for metallic glasses over twice the size of previous human curated databases.
CLOct 16, 2023
Interpreting and Exploiting Functional Specialization in Multi-Head Attention under Multi-task LearningChong Li, Shaonan Wang, Yunhao Zhang et al.
Transformer-based models, even though achieving super-human performance on several downstream tasks, are often regarded as a black box and used as a whole. It is still unclear what mechanisms they have learned, especially their core module: multi-head attention. Inspired by functional specialization in the human brain, which helps to efficiently handle multiple tasks, this work attempts to figure out whether the multi-head attention module will evolve similar function separation under multi-tasking training. If it is, can this mechanism further improve the model performance? To investigate these questions, we introduce an interpreting method to quantify the degree of functional specialization in multi-head attention. We further propose a simple multi-task training method to increase functional specialization and mitigate negative information transfer in multi-task learning. Experimental results on seven pre-trained transformer models have demonstrated that multi-head attention does evolve functional specialization phenomenon after multi-task training which is affected by the similarity of tasks. Moreover, the multi-task training strategy based on functional specialization boosts performance in both multi-task learning and transfer learning without adding any parameters.
CLJan 12, 2023
Language Cognition and Language Computation -- Human and Machine Language UnderstandingShaonan Wang, Nai Ding, Nan Lin et al.
Language understanding is a key scientific issue in the fields of cognitive and computer science. However, the two disciplines differ substantially in the specific research questions. Cognitive science focuses on analyzing the specific mechanism of the brain and investigating the brain's response to language; few studies have examined the brain's language system as a whole. By contrast, computer scientists focus on the efficiency of practical applications when choosing research questions but may ignore the most essential laws of language. Given these differences, can a combination of the disciplines offer new insights for building intelligent language models and studying language cognitive mechanisms? In the following text, we first review the research questions, history, and methods of language understanding in cognitive and computer science, focusing on the current progress and challenges. We then compare and contrast the research of language understanding in cognitive and computer sciences. Finally, we review existing work that combines insights from language cognition and language computation and offer prospects for future development trends.
CLApr 12
Computational Lesions in Multilingual Language Models Separate Shared and Language-specific Brain AlignmentYang Cui, Jingyuan Sun, Yizheng Sun et al.
How the brain supports language across different languages is a basic question in neuroscience and a useful test for multilingual artificial intelligence. Neuroimaging has identified language-responsive brain regions across languages, but it cannot by itself show whether the underlying processing is shared or language-specific. Here we use six multilingual large language models (LLMs) as controllable systems and create targeted ``computational lesions'' by zeroing small parameter sets that are important across languages or especially important for one language. We then compare intact and lesioned models in predicting functional magnetic resonance imaging (fMRI) responses during 100 minutes of naturalistic story listening in native English, Chinese and French (112 participants). Lesioning a compact shared core reduces whole-brain encoding correlation by 60.32% relative to intact models, whereas language-specific lesions preserve cross-language separation in embedding space but selectively weaken brain predictivity for the matched native language. These results support a shared backbone with embedded specializations and provide a causal framework for studying multilingual brain-model alignment.
CLJan 27
Component-Level Lesioning of Language Models Reveals Clinically Aligned Aphasia PhenotypesYifan Wang, Jichen Zheng, Jingyuan Sun et al.
Large language models (LLMs) increasingly exhibit human-like linguistic behaviors and internal representations that they could serve as computational simulators of language cognition. We ask whether LLMs can be systematically manipulated to reproduce language-production impairments characteristic of aphasia following focal brain lesions. Such models could provide scalable proxies for testing rehabilitation hypotheses, and offer a controlled framework for probing the functional organization of language. We introduce a clinically grounded, component-level framework that simulates aphasia by selectively perturbing functional components in LLMs, and apply it to both modular Mixture-of-Experts models and dense Transformers using a unified intervention interface. Our pipeline (i) identifies subtype-linked components for Broca's and Wernicke's aphasia, (ii) interprets these components with linguistic probing tasks, and (iii) induces graded impairments by progressively perturbing the top-k subtype-linked components, evaluating outcomes with Western Aphasia Battery (WAB) subtests summarized by Aphasia Quotient (AQ). Across architectures and lesioning strategies, subtype-targeted perturbations yield more systematic, aphasia-like regressions than size-matched random perturbations, and MoE modularity supports more localized and interpretable phenotype-to-component mappings. These findings suggest that modular LLMs, combined with clinically informed component perturbations, provide a promising platform for simulating aphasic language production and studying how distinct language functions degrade under targeted disruptions.
CLJan 23
Gated Tree Cross-attention for Checkpoint-Compatible Syntax Injection in Decoder-Only LLMsXinyu Gao, Shaonan Wang, Nai Ding
Decoder-only large language models achieve strong broad performance but are brittle to minor grammatical perturbations, undermining reliability for downstream reasoning. However, directly injecting explicit syntactic structure into an existing checkpoint can interfere with its pretrained competence. We introduce a checkpoint-compatible gated tree cross-attention (GTCA) branch that reads precomputed constituency chunk memory while leaving backbone architecture unchanged. Our design uses a token update mask and staged training to control the scope and timing of structural updates. Across benchmarks and Transformer backbones, GTCA strengthens syntactic robustness beyond continued-training baselines without compromising Multiple-Choice QA performance or commonsense reasoning, providing a practical checkpoint-compatible route to more syntax-robust decoder-only LLMs.
CLMar 2, 2024
MulCogBench: A Multi-modal Cognitive Benchmark Dataset for Evaluating Chinese and English Computational Language ModelsYunhao Zhang, Xiaohan Zhang, Chong Li et al.
Pre-trained computational language models have recently made remarkable progress in harnessing the language abilities which were considered unique to humans. Their success has raised interest in whether these models represent and process language like humans. To answer this question, this paper proposes MulCogBench, a multi-modal cognitive benchmark dataset collected from native Chinese and English participants. It encompasses a variety of cognitive data, including subjective semantic ratings, eye-tracking, functional magnetic resonance imaging (fMRI), and magnetoencephalography (MEG). To assess the relationship between language models and cognitive data, we conducted a similarity-encoding analysis which decodes cognitive data based on its pattern similarity with textual embeddings. Results show that language models share significant similarities with human cognitive data and the similarity patterns are modulated by the data modality and stimuli complexity. Specifically, context-aware models outperform context-independent models as language stimulus complexity increases. The shallow layers of context-aware models are better aligned with the high-temporal-resolution MEG signals whereas the deeper layers show more similarity with the high-spatial-resolution fMRI. These results indicate that language models have a delicate relationship with brain language representations. Moreover, the results between Chinese and English are highly consistent, suggesting the generalizability of these findings across languages.
CLMar 26, 2024
MapGuide: A Simple yet Effective Method to Reconstruct Continuous Language from Brain ActivitiesXinpei Zhao, Jingyuan Sun, Shaonan Wang et al.
Decoding continuous language from brain activity is a formidable yet promising field of research. It is particularly significant for aiding people with speech disabilities to communicate through brain signals. This field addresses the complex task of mapping brain signals to text. The previous best attempt reverse-engineered this process in an indirect way: it began by learning to encode brain activity from text and then guided text generation by aligning with predicted brain responses. In contrast, we propose a simple yet effective method that guides text reconstruction by directly comparing them with the predicted text embeddings mapped from brain activities. Comprehensive experiments reveal that our method significantly outperforms the current state-of-the-art model, showing average improvements of 77% and 54% on BLEU and METEOR scores. We further validate the proposed modules through detailed ablation studies and case analyses and highlight a critical correlation: the more precisely we map brain activities to text embeddings, the better the text reconstruction results. Such insight can simplify the task of reconstructing language from brain activities for future work, emphasizing the importance of improving brain-to-text-embedding mapping techniques.
CLMar 20, 2024
Computational Models to Study Language Processing in the Human Brain: A SurveyShaonan Wang, Jingyuan Sun, Yunhao Zhang et al.
Despite differing from the human language processing mechanism in implementation and algorithms, current language models demonstrate remarkable human-like or surpassing language capabilities. Should computational language models be employed in studying the brain, and if so, when and how? To delve into this topic, this paper reviews efforts in using computational models for brain research, highlighting emerging trends. To ensure a fair comparison, the paper evaluates various computational models using consistent metrics on the same dataset. Our analysis reveals that no single model outperforms others on all datasets, underscoring the need for rich testing datasets and rigid experimental control to draw robust conclusions in studies involving computational models.
CLApr 30, 2024
Navigating Brain Language Representations: A Comparative Analysis of Neural Language Models and Psychologically Plausible ModelsYunhao Zhang, Shaonan Wang, Xinyi Dong et al.
Neural language models, particularly large-scale ones, have been consistently proven to be most effective in predicting brain neural activity across a range of studies. However, previous research overlooked the comparison of these models with psychologically plausible ones. Moreover, evaluations were reliant on limited, single-modality, and English cognitive datasets. To address these questions, we conducted an analysis comparing encoding performance of various neural language models and psychologically plausible models. Our study utilized extensive multi-modal cognitive datasets, examining bilingual word and discourse levels. Surprisingly, our findings revealed that psychologically plausible models outperformed neural language models across diverse contexts, encompassing different modalities such as fMRI and eye-tracking, and spanning languages from English to Chinese. Among psychologically plausible models, the one incorporating embodied information emerged as particularly exceptional. This model demonstrated superior performance at both word and discourse levels, exhibiting robust prediction of brain activation across numerous regions in both English and Chinese.
CLAug 29, 2025
Discovering Semantic Subdimensions through Disentangled Conceptual RepresentationsYunhao Zhang, Shaonan Wang, Nan Lin et al.
Understanding the core dimensions of conceptual semantics is fundamental to uncovering how meaning is organized in language and the brain. Existing approaches often rely on predefined semantic dimensions that offer only broad representations, overlooking finer conceptual distinctions. This paper proposes a novel framework to investigate the subdimensions underlying coarse-grained semantic dimensions. Specifically, we introduce a Disentangled Continuous Semantic Representation Model (DCSRM) that decomposes word embeddings from large language models into multiple sub-embeddings, each encoding specific semantic information. Using these sub-embeddings, we identify a set of interpretable semantic subdimensions. To assess their neural plausibility, we apply voxel-wise encoding models to map these subdimensions to brain activation. Our work offers more fine-grained interpretable semantic subdimensions of conceptual meaning. Further analyses reveal that semantic dimensions are structured according to distinct principles, with polarity emerging as a key factor driving their decomposition into subdimensions. The neural correlates of the identified subdimensions support their cognitive and neuroscientific plausibility.
CLMay 15, 2025
Decoding the Multimodal Mind: Generalizable Brain-to-Text Translation via Multimodal Alignment and Adaptive RoutingChunyu Ye, Yunhao Zhang, Jingyuan Sun et al.
Decoding language from the human brain remains a grand challenge for Brain-Computer Interfaces (BCIs). Current approaches typically rely on unimodal brain representations, neglecting the brain's inherently multimodal processing. Inspired by the brain's associative mechanisms, where viewing an image can evoke related sounds and linguistic representations, we propose a unified framework that leverages Multimodal Large Language Models (MLLMs) to align brain signals with a shared semantic space encompassing text, images, and audio. A router module dynamically selects and fuses modality-specific brain features according to the characteristics of each stimulus. Experiments on various fMRI datasets with textual, visual, and auditory stimuli demonstrate state-of-the-art performance, achieving an 8.48% improvement on the most commonly used benchmark. We further extend our framework to EEG and MEG data, demonstrating flexibility and robustness across varying temporal and spatial resolutions. To our knowledge, this is the first unified BCI architecture capable of robustly decoding multimodal brain activity across diverse brain signals and stimulus types, offering a flexible solution for real-world applications.
CVFeb 22, 2024
A Self-supervised Pressure Map human keypoint Detection Approch: Optimizing Generalization and Computational Efficiency Across DatasetsChengzhang Yu, Xianjun Yang, Wenxia Bao et al.
In environments where RGB images are inadequate, pressure maps is a viable alternative, garnering scholarly attention. This study introduces a novel self-supervised pressure map keypoint detection (SPMKD) method, addressing the current gap in specialized designs for human keypoint extraction from pressure maps. Central to our contribution is the Encoder-Fuser-Decoder (EFD) model, which is a robust framework that integrates a lightweight encoder for precise human keypoint detection, a fuser for efficient gradient propagation, and a decoder that transforms human keypoints into reconstructed pressure maps. This structure is further enhanced by the Classification-to-Regression Weight Transfer (CRWT) method, which fine-tunes accuracy through initial classification task training. This innovation not only enhances human keypoint generalization without manual annotations but also showcases remarkable efficiency and generalization, evidenced by a reduction to only $5.96\%$ in FLOPs and $1.11\%$ in parameter count compared to the baseline methods.
CLMay 26, 2025
How Syntax Specialization Emerges in Language ModelsXufeng Duan, Zhaoqian Yao, Yunhao Zhang et al.
Large language models (LLMs) have been found to develop surprising internal specializations: Individual neurons, attention heads, and circuits become selectively sensitive to syntactic structure, reflecting patterns observed in the human brain. While this specialization is well-documented, how it emerges during training and what influences its development remains largely unknown. In this work, we tap into the black box of specialization by tracking its formation over time. By quantifying internal syntactic consistency across minimal pairs from various syntactic phenomena, we identify a clear developmental trajectory: Syntactic sensitivity emerges gradually, concentrates in specific layers, and exhibits a 'critical period' of rapid internal specialization. This process is consistent across architectures and initialization parameters (e.g., random seeds), and is influenced by model scale and training data. We therefore reveal not only where syntax arises in LLMs but also how some models internalize it during training. To support future research, we will release the code, models, and training checkpoints upon acceptance.
CVMay 26, 2023
Contrast, Attend and Diffuse to Decode High-Resolution Images from Brain ActivitiesJingyuan Sun, Mingxiao Li, Zijiao Chen et al.
Decoding visual stimuli from neural responses recorded by functional Magnetic Resonance Imaging (fMRI) presents an intriguing intersection between cognitive neuroscience and machine learning, promising advancements in understanding human visual perception and building non-invasive brain-machine interfaces. However, the task is challenging due to the noisy nature of fMRI signals and the intricate pattern of brain visual representations. To mitigate these challenges, we introduce a two-phase fMRI representation learning framework. The first phase pre-trains an fMRI feature learner with a proposed Double-contrastive Mask Auto-encoder to learn denoised representations. The second phase tunes the feature learner to attend to neural activation patterns most informative for visual reconstruction with guidance from an image auto-encoder. The optimized fMRI feature learner then conditions a latent diffusion model to reconstruct image stimuli from brain activities. Experimental results demonstrate our model's superiority in generating high-resolution and semantically accurate images, substantially exceeding previous state-of-the-art methods by 39.34% in the 50-way-top-1 semantic classification accuracy. Our research invites further exploration of the decoding task's potential and contributes to the development of non-invasive brain-machine interfaces.
CLAug 31, 2019
NCLS: Neural Cross-Lingual SummarizationJunnan Zhu, Qian Wang, Yining Wang et al.
Cross-lingual summarization (CLS) is the task to produce a summary in one particular language for a source document in a different language. Existing methods simply divide this task into two steps: summarization and translation, leading to the problem of error propagation. To handle that, we present an end-to-end CLS framework, which we refer to as Neural Cross-Lingual Summarization (NCLS), for the first time. Moreover, we propose to further improve NCLS by incorporating two related tasks, monolingual summarization and machine translation, into the training process of CLS under multi-task learning. Due to the lack of supervised CLS data, we propose a round-trip translation strategy to acquire two high-quality large-scale CLS datasets based on existing monolingual summarization datasets. Experimental results have shown that our NCLS achieves remarkable improvement over traditional pipeline methods on both English-to-Chinese and Chinese-to-English CLS human-corrected test sets. In addition, NCLS with multi-task learning can further significantly improve the quality of generated summaries. We make our dataset and code publicly available here: http://www.nlpr.ia.ac.cn/cip/dataset.htm.
LGJul 1, 2019
Understanding Memory Modules on Learning Simple AlgorithmsKexin Wang, Yu Zhou, Shaonan Wang et al.
Recent work has shown that memory modules are crucial for the generalization ability of neural networks on learning simple algorithms. However, we still have little understanding of the working mechanism of memory modules. To alleviate this problem, we apply a two-step analysis pipeline consisting of first inferring hypothesis about what strategy the model has learned according to visualization and then verify it by a novel proposed qualitative analysis method based on dimension reduction. Using this method, we have analyzed two popular memory-augmented neural networks, neural Turing machine and stack-augmented neural network on two simple algorithm tasks including reversing a random sequence and evaluation of arithmetic expressions. Results have shown that on the former task both models can learn to generalize and on the latter task only the stack-augmented model can do so. We show that different strategies are learned by the models, in which specific categories of input are monitored and different policies are made based on that to change the memory.
CLJan 2, 2018
Learning Multimodal Word Representation via Dynamic Fusion MethodsShaonan Wang, Jiajun Zhang, Chengqing Zong
Multimodal models have been proven to outperform text-based models on learning semantic word representations. Almost all previous multimodal models typically treat the representations from different modalities equally. However, it is obvious that information from different modalities contributes differently to the meaning of words. This motivates us to build a multimodal model that can dynamically fuse the semantic representations from different modalities according to different types of words. To that end, we propose three novel dynamic fusion methods to assign importance weights to each modality, in which weights are learned under the weak supervision of word association pairs. The extensive experiments have demonstrated that the proposed methods outperform strong unimodal baselines and state-of-the-art multimodal models.
CLNov 15, 2017
Investigating Inner Properties of Multimodal Representation and Semantic Compositionality with Brain-based Componential SemanticsShaonan Wang, Jiajun Zhang, Nan Lin et al.
Multimodal models have been proven to outperform text-based approaches on learning semantic representations. However, it still remains unclear what properties are encoded in multimodal representations, in what aspects do they outperform the single-modality representations, and what happened in the process of semantic compositionality in different input modalities. Considering that multimodal models are originally motivated by human concept representations, we assume that correlating multimodal representations with brain-based semantics would interpret their inner properties to answer the above questions. To that end, we propose simple interpretation methods based on brain-based componential semantics. First we investigate the inner properties of multimodal representations by correlating them with corresponding brain-based property vectors. Then we map the distributed vector space to the interpretable brain-based componential space to explore the inner properties of semantic compositionality. Ultimately, the present paper sheds light on the fundamental questions of natural language understanding, such as how to represent the meaning of words and how to combine word meanings into larger units.
CLSep 29, 2016
Learning Sentence Representation with Guidance of Human AttentionShaonan Wang, Jiajun Zhang, Chengqing Zong
Recently, much progress has been made in learning general-purpose sentence representations that can be used across domains. However, most of the existing models typically treat each word in a sentence equally. In contrast, extensive studies have proven that human read sentences efficiently by making a sequence of fixation and saccades. This motivates us to improve sentence representations by assigning different weights to the vectors of the component words, which can be treated as an attention mechanism on single sentences. To that end, we propose two novel attention models, in which the attention weights are derived using significant predictors of human reading time, i.e., Surprisal, POS tags and CCG supertags. The extensive experiments demonstrate that the proposed methods significantly improve upon the state-of-the-art sentence representation models.