Tieyun Qian

CL
h-index9
30papers
380citations
Novelty52%
AI Score58

30 Papers

CLFeb 13, 2023Code
Type-Aware Decomposed Framework for Few-Shot Named Entity Recognition

Yongqi Li, Yu Yu, Tieyun Qian

Despite the recent success achieved by several two-stage prototypical networks in few-shot named entity recognition (NER) task, the overdetected false spans at the span detection stage and the inaccurate and unstable prototypes at the type classification stage remain to be challenging problems. In this paper, we propose a novel Type-Aware Decomposed framework, namely TadNER, to solve these problems. We first present a type-aware span filtering strategy to filter out false spans by removing those semantically far away from type names. We then present a type-aware contrastive learning strategy to construct more accurate and stable prototypes by jointly exploiting support samples and type names as references. Extensive experiments on various benchmarks prove that our proposed TadNER framework yields a new state-of-the-art performance. Our code and data will be available at https://github.com/NLPWM-WHU/TadNER.

CLJun 11, 2023
Mimicking the Thinking Process for Emotion Recognition in Conversation with Prompts and Paraphrasing

Ting Zhang, Zhuang Chen, Ming Zhong et al.

Emotion recognition in conversation, which aims to predict the emotion for all utterances, has attracted considerable research attention in recent years. It is a challenging task since the recognition of the emotion in one utterance involves many complex factors, such as the conversational context, the speaker's background, and the subtle difference between emotion labels. In this paper, we propose a novel framework which mimics the thinking process when modeling these factors. Specifically, we first comprehend the conversational context with a history-oriented prompt to selectively gather information from predecessors of the target utterance. We then model the speaker's background with an experience-oriented prompt to retrieve the similar utterances from all conversations. We finally differentiate the subtle label semantics with a paraphrasing mechanism to elicit the intrinsic label related knowledge. We conducted extensive experiments on three benchmarks. The empirical results demonstrate the superiority of our proposed framework over the state-of-the-art baselines.

AINov 8, 2025Code
Can a Small Model Learn to Look Before It Leaps? Dynamic Learning and Proactive Correction for Hallucination Detection

Zepeng Bao, Shen Zhou, Qiankun Pi et al.

Hallucination in large language models (LLMs) remains a critical barrier to their safe deployment. Existing tool-augmented hallucination detection methods require pre-defined fixed verification strategies, which are crucial to the quality and effectiveness of tool calls. Some methods directly employ powerful closed-source LLMs such as GPT-4 as detectors, which are effective but too costly. To mitigate the cost issue, some methods adopt the teacher-student architecture and finetune open-source small models as detectors via agent tuning. However, these methods are limited by fixed strategies. When faced with a dynamically changing execution environment, they may lack adaptability and inappropriately call tools, ultimately leading to detection failure. To address the problem of insufficient strategy adaptability, we propose the innovative ``Learning to Evaluate and Adaptively Plan''(LEAP) framework, which endows an efficient student model with the dynamic learning and proactive correction capabilities of the teacher model. Specifically, our method formulates the hallucination detection problem as a dynamic strategy learning problem. We first employ a teacher model to generate trajectories within the dynamic learning loop and dynamically adjust the strategy based on execution failures. We then distill this dynamic planning capability into an efficient student model via agent tuning. Finally, during strategy execution, the student model adopts a proactive correction mechanism, enabling it to propose, review, and optimize its own verification strategies before execution. We demonstrate through experiments on three challenging benchmarks that our LEAP-tuned model outperforms existing state-of-the-art methods.

LGDec 19, 2025
Understanding Generalization in Role-Playing Models via Information Theory

Yongqi Li, Hao Lang, Fei Huang et al.

Role-playing models (RPMs) are widely used in real-world applications but underperform when deployed in the wild. This degradation can be attributed to distribution shifts, including user, character, and dialogue compositional shifts. Existing methods like LLM-as-a-judge fall short in providing a fine-grained diagnosis of how these shifts affect RPM generalization, and thus there lack formal frameworks to characterize RPM generalization behaviors. To bridge these gaps, we introduce an information-theoretic metric, named reasoning-based effective mutual information difference (R-EMID), to measure RPM performance degradation in an interpretable way. We also derive an upper bound on R-EMID to predict the worst-case generalization performance of RPMs and theoretically reveal how various shifts contribute to the RPM performance degradation. Moreover, we propose a co-evolving reinforcement learning framework to adaptively model the connection among user, character, and dialogue context and thus enhance the estimation of dialogue response generation probability, which is critical for calculating R-EMID. Finally, we evaluate the generalization performance of various RPMs using R-EMID, finding that user shift poses the highest risk among all shifts and reinforcement learning is the most effective approach for enhancing RPM generalization.

CLJan 12
Controlling Multimodal Conversational Agents with Coverage-Enhanced Latent Actions

Yongqi Li, Hao Lang, Tieyun Qian et al.

Vision-language models are increasingly employed as multimodal conversational agents (MCAs) for diverse conversational tasks. Recently, reinforcement learning (RL) has been widely explored for adapting MCAs to various human-AI interaction scenarios. Despite showing great enhancement in generalization performance, fine-tuning MCAs via RL still faces challenges in handling the extremely large text token space. To address this, we learn a compact latent action space for RL fine-tuning instead. Specifically, we adopt the learning from observation mechanism to construct the codebook for the latent action space, where future observations are leveraged to estimate current latent actions that could further be used to reconstruct future observations. However, the scarcity of paired image-text data hinders learning a codebook with sufficient coverage. Thus, we leverage both paired image-text data and text-only data to construct the latent action space, using a cross-modal projector for transforming text embeddings into image-text embeddings. We initialize the cross-modal projector on paired image-text data, and further train it on massive text-only data with a novel cycle consistency loss to enhance its robustness. We show that our latent action based method outperforms competitive baselines on two conversation tasks across various RL algorithms.

AIApr 14
Every Picture Tells a Dangerous Story: Memory-Augmented Multi-Agent Jailbreak Attacks on VLMs

Jianhao Chen, Haoyang Chen, Hanjie Zhao et al.

The rapid evolution of Vision-Language Models (VLMs) has catalyzed unprecedented capabilities in artificial intelligence; however, this continuous modal expansion has inadvertently exposed a vastly broadened and unconstrained adversarial attack surface. Current multimodal jailbreak strategies primarily focus on surface-level pixel perturbations and typographic attacks or harmful images; however, they fail to engage with the complex semantic structures intrinsic to visual data. This leaves the vast semantic attack surface of original, natural images largely unscrutinized. Driven by the need to expose these deep-seated semantic vulnerabilities, we introduce \textbf{MemJack}, a \textbf{MEM}ory-augmented multi-agent \textbf{JA}ilbreak atta\textbf{CK} framework that explicitly leverages visual semantics to orchestrate automated jailbreak attacks. MemJack employs coordinated multi-agent cooperation to dynamically map visual entities to malicious intents, generate adversarial prompts via multi-angle visual-semantic camouflage, and utilize an Iterative Nullspace Projection (INLP) geometric filter to bypass premature latent space refusals. By accumulating and transferring successful strategies through a persistent Multimodal Experience Memory, MemJack maintains highly coherent extended multi-turn jailbreak attack interactions across different images, thereby improving the attack success rate (ASR) on new images. Extensive empirical evaluations across full, unmodified COCO val2017 images demonstrate that MemJack achieves a 71.48\% ASR against Qwen3-VL-Plus, scaling to 90\% under extended budgets. Furthermore, to catalyze future defensive alignment research, we will release \textbf{MemJack-Bench}, a comprehensive dataset comprising over 113,000 interactive multimodal jailbreak attack trajectories, establishing a vital foundation for developing inherently robust VLMs.

CRJul 8, 2025Code
CAVGAN: Unifying Jailbreak and Defense of LLMs via Generative Adversarial Attacks on their Internal Representations

Xiaohu Li, Yunfeng Ning, Zepeng Bao et al.

Security alignment enables the Large Language Model (LLM) to gain the protection against malicious queries, but various jailbreak attack methods reveal the vulnerability of this security mechanism. Previous studies have isolated LLM jailbreak attacks and defenses. We analyze the security protection mechanism of the LLM, and propose a framework that combines attack and defense. Our method is based on the linearly separable property of LLM intermediate layer embedding, as well as the essence of jailbreak attack, which aims to embed harmful problems and transfer them to the safe area. We utilize generative adversarial network (GAN) to learn the security judgment boundary inside the LLM to achieve efficient jailbreak attack and defense. The experimental results indicate that our method achieves an average jailbreak success rate of 88.85\% across three popular LLMs, while the defense success rate on the state-of-the-art jailbreak dataset reaches an average of 84.17\%. This not only validates the effectiveness of our approach but also sheds light on the internal security mechanisms of LLMs, offering new insights for enhancing model security The code and data are available at https://github.com/NLPGM/CAVGAN.

CLMar 16
ContiGuard: A Framework for Continual Toxicity Detection Against Evolving Evasive Perturbations

Hankun Kang, Xin Miao, Jianhao Chen et al.

Toxicity detection mitigates the dissemination of toxic content (e.g., hateful comments, posts, and messages within online social actions) to safeguard a healthy online social environment. However, malicious users persistently develop evasive perturbations to disguise toxic content and evade detectors. Traditional detectors or methods are static over time and are inadequate in addressing these evolving evasion tactics. Thus, continual learning emerges as a logical approach to dynamically update detection ability against evolving perturbations. Nevertheless, disparities across perturbations hinder the detector's continual learning on perturbed text. More importantly, perturbation-induced noises distort semantics to degrade comprehension and also impair critical feature learning to render detection sensitive to perturbations. These amplify the challenge of continual learning against evolving perturbations. In this work, we present ContiGuard, the first framework tailored for continual learning of the detector on time-evolving perturbed text (termed continual toxicity detection) to enable the detector to continually update capability and maintain sustained resilience against evolving perturbations. Specifically, to boost the comprehension, we present an LLM-powered semantic enriching strategy, where we dynamically incorporate possible meaning and toxicity-related clues excavated by LLM into the perturbed text to improve the comprehension. To mitigate non-critical features and amplify critical ones, we propose a discriminability-driven feature learning strategy, where we strengthen discriminative features while suppressing the less-discriminative ones to shape a robust classification boundary for detection...

AIAug 18, 2025Code
FuSaR: A Fuzzification-Based Method for LRM Safety-Reasoning Balance

Jianhao Chen, Mayi Xu, Xiaohu Li et al.

Large Reasoning Models (LRMs) have demonstrated impressive performance across various tasks due to their powerful reasoning capabilities. However, their safety performance remains a significant concern. In this paper, we explore the reasons behind the vulnerability of LRMs. Based on this, we propose a novel method to improve the safety of LLMs without sacrificing their reasoning capability. Specifically, we exploit the competition between LRM's reasoning ability and safety ability, and achieve jailbreak by improving LRM's reasoning performance to reduce its safety performance. We then introduce an alignment strategy based on Fuzzification to balance Safety-Reasoning (FuSaR), by detoxifying the harmful reasoning process, where both the dangerous entities and the dangerous procedures in the reasoning steps are hidden. FuSaR successfully mitigates safety risks while preserving core reasoning information. We validate this strategy through alignment experiments on several open-source LRMs using detoxified reasoning data. The results compared with existing baselines conclusively show that FuSaR is an efficient alignment strategy to simultaneously enhance both the reasoning capability and safety of LRMs.

CLAug 12, 2025Code
A Survey on Training-free Alignment of Large Language Models

Birong Pan, Yongqi Li, Weiyu Zhang et al.

The alignment of large language models (LLMs) aims to ensure their outputs adhere to human values, ethical standards, and legal norms. Traditional alignment methods often rely on resource-intensive fine-tuning (FT), which may suffer from knowledge degradation and face challenges in scenarios where the model accessibility or computational resources are constrained. In contrast, training-free (TF) alignment techniques--leveraging in-context learning, decoding-time adjustments, and post-generation corrections--offer a promising alternative by enabling alignment without heavily retraining LLMs, making them adaptable to both open-source and closed-source environments. This paper presents the first systematic review of TF alignment methods, categorizing them by stages of pre-decoding, in-decoding, and post-decoding. For each stage, we provide a detailed examination from the viewpoint of LLMs and multimodal LLMs (MLLMs), highlighting their mechanisms and limitations. Furthermore, we identify key challenges and future directions, paving the way for more inclusive and effective TF alignment techniques. By synthesizing and organizing the rapidly growing body of research, this survey offers a guidance for practitioners and advances the development of safer and more reliable LLMs.

AIJun 1, 2025Code
Aligning VLM Assistants with Personalized Situated Cognition

Yongqi Li, Shen Zhou, Xiaohu Li et al.

Vision-language models (VLMs) aligned with general human objectives, such as being harmless and hallucination-free, have become valuable assistants of humans in managing visual tasks. However, people with diversified backgrounds have different cognition even in the same situation. Consequently, they may have personalized expectations for VLM assistants. This highlights the urgent need to align VLM assistants with personalized situated cognition for real-world assistance. To study this problem, we first simplify it by characterizing individuals based on the sociological concept of Role-Set. Then, we propose to evaluate the individuals' actions to examine whether the personalized alignment is achieved. Further, we construct a benchmark named PCogAlignBench, which includes 18k instances and 20 individuals with different Role-Sets. Finally, we present a framework called PCogAlign, which constructs a cognition-aware and action-based reward model for personalized alignment. Experimental results and human evaluations demonstrate the reliability of the PCogAlignBench and the effectiveness of our proposed PCogAlign. We will open-source the constructed benchmark and code at https://github.com/NLPGM/PCogAlign.

CRDec 17, 2024Code
Toxicity Detection towards Adaptability to Changing Perturbations

Hankun Kang, Jianhao Chen, Yongqi Li et al.

Toxicity detection is crucial for maintaining the peace of the society. While existing methods perform well on normal toxic contents or those generated by specific perturbation methods, they are vulnerable to evolving perturbation patterns. However, in real-world scenarios, malicious users tend to create new perturbation patterns for fooling the detectors. For example, some users may circumvent the detector of large language models (LLMs) by adding `I am a scientist' at the beginning of the prompt. In this paper, we introduce a novel problem, i.e., continual learning jailbreak perturbation patterns, into the toxicity detection field. To tackle this problem, we first construct a new dataset generated by 9 types of perturbation patterns, 7 of them are summarized from prior work and 2 of them are developed by us. We then systematically validate the vulnerability of current methods on this new perturbation pattern-aware dataset via both the zero-shot and fine tuned cross-pattern detection. Upon this, we present the domain incremental learning paradigm and the corresponding benchmark to ensure the detector's robustness to dynamically emerging types of perturbed toxic text. Our code and dataset are provided in the appendix and will be publicly available at GitHub, by which we wish to offer new research opportunities for the security-relevant communities.

CLMay 3, 2023Code
Generative Meta-Learning for Zero-Shot Relation Triplet Extraction

Wanli Li, Tieyun Qian, Yi Song et al.

Zero-shot Relation Triplet Extraction (ZeroRTE) aims to extract relation triplets from texts containing unseen relation types. This capability benefits various downstream information retrieval (IR) tasks. The primary challenge lies in enabling models to generalize effectively to unseen relation categories. Existing approaches typically leverage the knowledge embedded in pre-trained language models to accomplish the generalization process. However, these methods focus solely on fitting the training data during training, without specifically improving the model's generalization performance, resulting in limited generalization capability. For this reason, we explore the integration of bi-level optimization (BLO) with pre-trained language models for learning generalized knowledge directly from the training data, and propose a generative meta-learning framework which exploits the `learning-to-learn' ability of meta-learning to boost the generalization capability of generative models. Specifically, we introduce a BLO approach that simultaneously addresses data fitting and generalization. This is achieved by constructing an upper-level loss to focus on generalization and a lower-level loss to ensure accurate data fitting. Building on this, we subsequently develop three generative meta-learning methods, each tailored to a distinct category of meta-learning. Extensive experimental results demonstrate that our framework performs well on the ZeroRTE task. Our code is available at https://github.com/leeworry/TGM-MetaLearning.

CLJan 2, 2025
Reasoning based on symbolic and parametric knowledge bases: a survey

Mayi Xu, Yunfeng Ning, Yongqi Li et al.

Reasoning is fundamental to human intelligence, and critical for problem-solving, decision-making, and critical thinking. Reasoning refers to drawing new conclusions based on existing knowledge, which can support various applications like clinical diagnosis, basic education, and financial analysis. Though a good number of surveys have been proposed for reviewing reasoning-related methods, none of them has systematically investigated these methods from the viewpoint of their dependent knowledge base. Both the scenarios to which the knowledge bases are applied and their storage formats are significantly different. Hence, investigating reasoning methods from the knowledge base perspective helps us better understand the challenges and future directions. To fill this gap, this paper first classifies the knowledge base into symbolic and parametric ones. The former explicitly stores information in human-readable symbols, and the latter implicitly encodes knowledge within parameters. Then, we provide a comprehensive overview of reasoning methods using symbolic knowledge bases, parametric knowledge bases, and both of them. Finally, we identify the future direction toward enhancing reasoning capabilities to bridge the gap between human and machine intelligence.

CLAug 12, 2025
Privacy-protected Retrieval-Augmented Generation for Knowledge Graph Question Answering

Yunfeng Ning, Mayi Xu, Jintao Wen et al.

LLMs often suffer from hallucinations and outdated or incomplete knowledge. RAG is proposed to address these issues by integrating external knowledge like that in KGs into LLMs. However, leveraging private KGs in RAG systems poses significant privacy risks due to the black-box nature of LLMs and potential insecure data transmission, especially when using third-party LLM APIs lacking transparency and control. In this paper, we investigate the privacy-protected RAG scenario for the first time, where entities in KGs are anonymous for LLMs, thus preventing them from accessing entity semantics. Due to the loss of semantics of entities, previous RAG systems cannot retrieve question-relevant knowledge from KGs by matching questions with the meaningless identifiers of anonymous entities. To realize an effective RAG system in this scenario, two key challenges must be addressed: (1) How can anonymous entities be converted into retrievable information. (2) How to retrieve question-relevant anonymous entities. Hence, we propose a novel ARoG framework including relation-centric abstraction and structure-oriented abstraction strategies. For challenge (1), the first strategy abstracts entities into high-level concepts by dynamically capturing the semantics of their adjacent relations. It supplements meaningful semantics which can further support the retrieval process. For challenge (2), the second strategy transforms unstructured natural language questions into structured abstract concept paths. These paths can be more effectively aligned with the abstracted concepts in KGs, thereby improving retrieval performance. To guide LLMs to effectively retrieve knowledge from KGs, the two strategies strictly protect privacy from being exposed to LLMs. Experiments on three datasets demonstrate that ARoG achieves strong performance and privacy-robustness.

SIFeb 15
Beyond Static Snapshots: Dynamic Modeling and Forecasting of Group-Level Value Evolution with Large Language Models

Qiankun Pi, Guixin Su, Jinliang Li et al.

Social simulation is critical for mining complex social dynamics and supporting data-driven decision making. LLM-based methods have emerged as powerful tools for this task by leveraging human-like social questionnaire responses to model group behaviors. Existing LLM-based approaches predominantly focus on group-level values at discrete time points, treating them as static snapshots rather than dynamic processes. However, group-level values are not fixed but shaped by long-term social changes. Modeling their dynamics is thus crucial for accurate social evolution prediction--a key challenge in both data mining and social science. This problem remains underexplored due to limited longitudinal data, group heterogeneity, and intricate historical event impacts. To bridge this gap, we propose a novel framework for group-level dynamic social simulation by integrating historical value trajectories into LLM-based human response modeling. We select China and the U.S. as representative contexts, conducting stratified simulations across four core sociodemographic dimensions (gender, age, education, income). Using the World Values Survey, we construct a multi-wave, group-level longitudinal dataset to capture historical value evolution, and then propose the first event-based prediction method for this task, unifying social events, current value states, and group attributes into a single framework. Evaluations across five LLM families show substantial gains: a maximum 30.88\% improvement on seen questions and 33.97\% on unseen questions over the Vanilla baseline. We further find notable cross-group heterogeneity: U.S. groups are more volatile than Chinese groups, and younger groups in both countries are more sensitive to external changes. These findings advance LLM-based social simulation and provide new insights for social scientists to understand and predict social value changes.

LGAug 13, 2025
NeuronTune: Fine-Grained Neuron Modulation for Balanced Safety-Utility Alignment in LLMs

Birong Pan, Mayi Xu, Qiankun Pi et al.

Ensuring robust safety alignment while preserving utility is critical for the reliable deployment of Large Language Models (LLMs). However, current techniques fundamentally suffer from intertwined deficiencies: insufficient robustness against malicious attacks, frequent refusal of benign queries, degradation in generated text quality and general task performance--the former two reflecting deficits in robust safety and the latter constituting utility impairment. We trace these limitations to the coarse-grained layer-wise interventions in existing methods. To resolve this, we propose NeuronTune, a fine-grained framework that dynamically modulates sparse neurons to achieve simultaneous safety-utility optimization. Our approach first identifies safety-critical and utility-preserving neurons across all layers via attribution, then employs meta-learning to adaptively amplify safety-neuron activations and suppress utility-neuron activations. Crucially, NeuronTune enables tunable adjustment of intervention scope via neuron-count thresholds, supporting flexible adaptation to security-critical or utility-priority scenarios. Extensive experimental results demonstrate that our method significantly outperforms existing state-of-the-art technologies, achieving superior model safety while maintaining excellent utility.

CLAug 13, 2025
Format as a Prior: Quantifying and Analyzing Bias in LLMs for Heterogeneous Data

Jiacheng Liu, Mayi Xu, Qiankun Pi et al.

Large Language Models (LLMs) are increasingly employed in applications that require processing information from heterogeneous formats, including texts, tables, infoboxes, and knowledge graphs. However, systematic biases toward particular formats may undermine LLMs' ability to integrate heterogeneous data impartially, potentially resulting in reasoning errors and increased risks in downstream tasks. Yet it remains unclear whether such biases are systematic, which data-level factors drive them, and what internal mechanisms underlie their emergence. In this paper, we present the first comprehensive study of format bias in LLMs through a three-stage empirical analysis. The first stage explores the presence and direction of bias across a diverse range of LLMs. The second stage examines how key data-level factors influence these biases. The third stage analyzes how format bias emerges within LLMs' attention patterns and evaluates a lightweight intervention to test its effectiveness. Our results show that format bias is consistent across model families, driven by information richness, structure quality, and representation type, and is closely associated with attention imbalance within the LLMs. Based on these investigations, we identify three future research directions to reduce format bias: enhancing data pre-processing through format repair and normalization, introducing inference-time interventions such as attention re-weighting, and developing format-balanced training corpora. These directions will support the design of more robust and fair heterogeneous data processing systems.

CLDec 10, 2024
Enhancing Relation Extraction via Supervised Rationale Verification and Feedback

Yongqi Li, Xin Miao, Shen Zhou et al.

Despite the rapid progress that existing automated feedback methods have made in correcting the output of large language models (LLMs), these methods cannot be well applied to the relation extraction (RE) task due to their designated feedback objectives and correction manner. To address this problem, we propose a novel automated feedback framework for RE, which presents a rationale supervisor to verify the rationale and provides re-selected demonstrations as feedback to correct the initial prediction. Specifically, we first design a causal intervention and observation method to collect biased/unbiased rationales for contrastive training the rationale supervisor. Then, we present a verification-feedback-correction procedure to iteratively enhance LLMs' capability of handling the RE task. Extensive experiments prove that our proposed framework significantly outperforms existing methods.

LGNov 29, 2024
Origin-Destination Demand Prediction: An Urban Radiation and Attraction Perspective

Xuan Ma, Zepeng Bao, Ming Zhong et al.

In recent years, origin-destination (OD) demand prediction has gained significant attention for its profound implications in urban development. Existing data-driven deep learning methods primarily focus on the spatial or temporal dependency between regions yet neglecting regions' fundamental functional difference. Though knowledge-driven physical methods have characterised regions' functions by their radiation and attraction capacities, these functions are defined on numerical factors like population without considering regions' intrinsic nominal attributes, e.g., a region is a residential or industrial district. Moreover, the complicated relationships between two types of capacities, e.g., the radiation capacity of a residential district in the morning will be transformed into the attraction capacity in the evening, are totally missing from physical methods. In this paper, we not only generalize the physical radiation and attraction capacities into the deep learning framework with the extended capability to fulfil regions' functions, but also present a new model that captures the relationships between two types of capacities. Specifically, we first model regions' radiation and attraction capacities using a bilateral branch network, each equipped with regions' attribute representations. We then describe the transformation relationship of different capacities of the same region using a hypergraph-based parameter generation method. We finally unveil the competition relationship of different regions with the same attraction capacity through cluster-based adversarial learning. Extensive experiments on two datasets demonstrate the consistent improvements of our method over the state-of-the-art baselines, as well as the good explainability of regions' functions using their nominal attributes.

CLMay 24, 2023
Prompting Large Language Models for Counterfactual Generation: An Empirical Study

Yongqi Li, Mayi Xu, Xin Miao et al.

Large language models (LLMs) have made remarkable progress in a wide range of natural language understanding and generation tasks. However, their ability to generate counterfactuals has not been examined systematically. To bridge this gap, we present a comprehensive evaluation framework on various types of NLU tasks, which covers all key factors in determining LLMs' capability of generating counterfactuals. Based on this framework, we 1) investigate the strengths and weaknesses of LLMs as the counterfactual generator, and 2) disclose the factors that affect LLMs when generating counterfactuals, including both the intrinsic properties of LLMs and prompt designing. The results show that, though LLMs are promising in most cases, they face challenges in complex tasks like RE since they are bounded by task-specific performance, entity constraints, and inherent selection bias. We also find that alignment techniques, e.g., instruction-tuning and reinforcement learning from human feedback, may potentially enhance the counterfactual generation ability of LLMs. On the contrary, simply increasing the parameter size does not yield the desired improvements. Besides, from the perspective of prompt designing, task guidelines unsurprisingly play an important role. However, the chain-of-thought approach does not always help due to inconsistency issues.

CLFeb 22, 2022
Automatically Generating Counterfactuals for Relation Classification

Mi Zhang, Tieyun Qian, Ting Zhang

The goal of relation classification (RC) is to extract the semantic relations between/among entities in the text. As a fundamental task in natural language processing, it is crucial to ensure the robustness of RC models. Despite the high accuracy current deep neural models have achieved in RC tasks, they are easily affected by spurious correlations. One solution to this problem is to train the model with counterfactually augmented data (CAD) such that it can learn the causation rather than the confounding. However, no attempt has been made on generating counterfactuals for RC tasks. In this paper, we formulate the problem of automatically generating CAD for RC tasks from an entity-centric viewpoint, and develop a novel approach to derive contextual counterfactuals for entities. Specifically, we exploit two elementary topological properties, i.e., the centrality and the shortest path, in syntactic and semantic dependency graphs, to first identify and then intervene on the contextual causal features for entities. We conduct a comprehensive evaluation on four RC datasets by combining our proposed approach with a variety of backbone RC models. The results demonstrate that our approach not only improves the performance of the backbones, but also makes them more robust in the out-of-domain test.

CLDec 2, 2021
From Consensus to Disagreement: Multi-Teacher Distillation for Semi-Supervised Relation Extraction

Wanli Li, Tieyun Qian

Lack of labeled data is a main obstacle in relation extraction. Semi-supervised relation extraction (SSRE) has been proven to be a promising way for this problem through annotating unlabeled samples as additional training data. Almost all prior researches along this line adopt multiple models to make the annotations more reliable by taking the intersection set of predicted results from these models. However, the difference set, which contains rich information about unlabeled data, has been long neglected by prior studies. In this paper, we propose to learn not only from the consensus but also the disagreement among different models in SSRE. To this end, we develop a simple and general multi-teacher distillation (MTD) framework, which can be easily integrated into any existing SSRE methods. Specifically, we first let the teachers correspond to the multiple models and select the samples in the intersection set of the last iteration in SSRE methods to augment labeled data as usual. We then transfer the class distributions for samples in the difference set as soft labels to guide the student. We finally perform prediction using the trained student model. Experimental results on two public datasets demonstrate that our framework significantly promotes the performance of the base SSRE methods with pretty low computational cost.

CLJul 28, 2021
Multi-Scale Feature and Metric Learning for Relation Extraction

Mi Zhang, Tieyun Qian

Existing methods in relation extraction have leveraged the lexical features in the word sequence and the syntactic features in the parse tree. Though effective, the lexical features extracted from the successive word sequence may introduce some noise that has little or no meaningful content. Meanwhile, the syntactic features are usually encoded via graph convolutional networks which have restricted receptive field. To address the above limitations, we propose a multi-scale feature and metric learning framework for relation extraction. Specifically, we first develop a multi-scale convolutional neural network to aggregate the non-successive mainstays in the lexical sequence. We also design a multi-scale graph convolutional network which can increase the receptive field towards specific syntactic roles. Moreover, we present a multi-scale metric learning paradigm to exploit both the feature-level relation between lexical and syntactic features and the sample-level relation between instances with the same or different classes. We conduct extensive experiments on three real world datasets for various types of relation extraction tasks. The results demonstrate that our model significantly outperforms the state-of-the-art approaches.

IRJun 28, 2021
Intent Disentanglement and Feature Self-supervision for Novel Recommendation

Tieyun Qian, Yile Liang, Qing Li et al.

One key property in recommender systems is the long-tail distribution in user-item interactions where most items only have few user feedback. Improving the recommendation of tail items can promote novelty and bring positive effects to both users and providers, and thus is a desirable property of recommender systems. Current novel recommendation studies over-emphasize the importance of tail items without differentiating the degree of users' intent on popularity and often incur a sharp decline of accuracy. Moreover, none of existing methods has ever taken the extreme case of tail items, i.e., cold-start items without any interaction, into consideration. In this work, we first disclose the mechanism that drives a user's interaction towards popular or niche items by disentangling her intent into conformity influence (popularity) and personal interests (preference). We then present a unified end-to-end framework to simultaneously optimize accuracy and novelty targets based on the disentangled intent of popularity and that of preference. We further develop a new paradigm for novel recommendation of cold-start items which exploits the self-supervised learning technique to model the correlation between collaborative features and content features. We conduct extensive experimental results on three real-world datasets. The results demonstrate that our proposed model yields significant improvements over the state-of-the-art baselines in terms of accuracy, novelty, coverage, and trade-off.

IRJan 4, 2021
Recommending Accurate and Diverse Items Using Bilateral Branch Network

Yile Liang, Tieyun Qian

Recommender systems have played a vital role in online platforms due to the ability of incorporating users' personal tastes. Beyond accuracy, diversity has been recognized as a key factor in recommendation to broaden user's horizons as well as to promote enterprises' sales. However, the trading-off between accuracy and diversity remains to be a big challenge, and the data and user biases have not been explored yet. In this paper, we develop an adaptive learning framework for accurate and diversified recommendation. We generalize recent proposed bi-lateral branch network in the computer vision community from image classification to item recommendation. Specifically, we encode domain level diversity by adaptively balancing accurate recommendation in the conventional branch and diversified recommendation in the adaptive branch of a bilateral branch network. We also capture user level diversity using a two-way adaptive metric learning backbone network in each branch. We conduct extensive experiments on three real-world datasets. Results demonstrate that our proposed approach consistently outperforms the state-of-the-art baselines.

CLOct 22, 2020
Exploit Multiple Reference Graphs for Semi-supervised Relation Extraction

Wanli Li, Tieyun Qian

Manual annotation of the labeled data for relation extraction is time-consuming and labor-intensive. Semi-supervised methods can offer helping hands for this problem and have aroused great research interests. Existing work focuses on mapping the unlabeled samples to the classes to augment the labeled dataset. However, it is hard to find an overall good mapping function, especially for the samples with complicated syntactic components in one sentence. To tackle this limitation, we propose to build the connection between the unlabeled data and the labeled ones rather than directly mapping the unlabeled samples to the classes. Specifically, we first use three kinds of information to construct reference graphs, including entity reference, verb reference, and semantics reference. The goal is to semantically or lexically connect the unlabeled sample(s) to the labeled one(s). Then, we develop a Multiple Reference Graph (MRefG) model to exploit the reference information for better recognizing high-quality unlabeled samples. The effectiveness of our method is demonstrated by extensive comparison experiments with the state-of-the-art baselines on two public datasets.

IRDec 28, 2019
Solving Cold Start Problem in Recommendation with Attribute Graph Neural Networks

Tieyun Qian, Yile Liang, Qing Li

Matrix completion is a classic problem underlying recommender systems. It is traditionally tackled with matrix factorization. Recently, deep learning based methods, especially graph neural networks, have made impressive progress on this problem. Despite their effectiveness, existing methods focus on modeling the user-item interaction graph. The inherent drawback of such methods is that their performance is bound to the density of the interactions, which is however usually of high sparsity. More importantly, for a cold start user/item that does not have any interactions, such methods are unable to learn the preference embedding of the user/item since there is no link to this user/item in the graph. In this work, we develop a novel framework Attribute Graph Neural Networks (AGNN) by exploiting the attribute graph rather than the commonly used interaction graph. This leads to the capability of learning embeddings for cold start users/items. Our AGNN can produce the preference embedding for a cold user/item by learning on the distribution of attributes with an extended variational auto-encoder structure. Moreover, we propose a new graph neural network variant, i.e., gated-GNN, to effectively aggregate various attributes of different modalities in a neighborhood. Empirical results on two real-world datasets demonstrate that our model yields significant improvements for cold start recommendations and outperforms or matches state-of-the-arts performance in the warm start scenario.

IRDec 16, 2019
Seq2seq Translation Model for Sequential Recommendation

Ke Sun, Tieyun Qian

The context information such as product category plays a critical role in sequential recommendation. Recent years have witnessed a growing interest in context-aware sequential recommender systems. Existing studies often treat the contexts as auxiliary feature vectors without considering the sequential dependency in contexts. However, such a dependency provides valuable clues to predict the user's future behavior. For example, a user might buy electronic accessories after he/she buy an electronic product. In this paper, we propose a novel seq2seq translation architecture to highlight the importance of sequential dependency in contexts for sequential recommendation. Specifically, we first construct a collateral context sequence in addition to the main interaction sequence. We then generalize recent advancements in translation model from sequences of words in two languages to sequences of items and contexts in recommender systems. Taking the category information as an item's context, we develop a basic coupled and an extended tripled seq2seq translation models to encode the category-item and item-category-item relations between the item and context sequences. We conduct extensive experiments on three real world datasets. The results demonstrate the superior performance of the proposed model compared with the state-of-the-art baselines.

IRApr 28, 2017
Learning Spatiotemporal-Aware Representation for POI Recommendation

Bei Liu, Tieyun Qian, Bing Liu et al.

The wide spread of location-based social networks brings about a huge volume of user check-in data, which facilitates the recommendation of points of interest (POIs). Recent advances on distributed representation shed light on learning low dimensional dense vectors to alleviate the data sparsity problem. Current studies on representation learning for POI recommendation embed both users and POIs in a common latent space, and users' preference is inferred based on the distance/similarity between a user and a POI. Such an approach is not in accordance with the semantics of users and POIs as they are inherently different objects. In this paper, we present a novel spatiotemporal aware (STA) representation, which models the spatial and temporal information as \emph{a relationship connecting users and POIs}. Our model generalizes the recent advances in knowledge graph embedding. The basic idea is that the embedding of a $<$time, location$>$ pair corresponds to a translation from embeddings of users to POIs. Since the POI embedding should be close to the user embedding plus the relationship vector, the recommendation can be performed by selecting the top-\emph{k} POIs similar to the translated POI, which are all of the same type of objects. We conduct extensive experiments on two real-world datasets. The results demonstrate that our STA model achieves the state-of-the-art performance in terms of high recommendation accuracy, robustness to data sparsity and effectiveness in handling cold start problem.