Jianfeng Qu

CL
h-index15
18papers
1,676citations
Novelty46%
AI Score44

18 Papers

CLMar 7, 2023
Is ChatGPT a Good NLG Evaluator? A Preliminary Study

Jiaan Wang, Yunlong Liang, Fandong Meng et al. · tsinghua

Recently, the emergence of ChatGPT has attracted wide attention from the computational linguistics community. Many prior studies have shown that ChatGPT achieves remarkable performance on various NLP tasks in terms of automatic evaluation metrics. However, the ability of ChatGPT to serve as an evaluation metric is still underexplored. Considering assessing the quality of natural language generation (NLG) models is an arduous task and NLG metrics notoriously show their poor correlation with human judgments, we wonder whether ChatGPT is a good NLG evaluation metric. In this report, we provide a preliminary meta-evaluation on ChatGPT to show its reliability as an NLG metric. In detail, we regard ChatGPT as a human evaluator and give task-specific (e.g., summarization) and aspect-specific (e.g., relevance) instruction to prompt ChatGPT to evaluate the generated results of NLG models. We conduct experiments on five NLG meta-evaluation datasets (including summarization, story generation and data-to-text tasks). Experimental results show that compared with previous automatic metrics, ChatGPT achieves state-of-the-art or competitive correlation with human judgments in most cases. In addition, we find that the effectiveness of the ChatGPT evaluator might be influenced by the creation method of the meta-evaluation datasets. For the meta-evaluation datasets which are created greatly depending on the reference and thus are biased, the ChatGPT evaluator might lose its effectiveness. We hope our preliminary study could prompt the emergence of a general-purposed reliable NLG metric.

CLMar 23, 2022
A Survey on Cross-Lingual Summarization

Jiaan Wang, Fandong Meng, Duo Zheng et al. · tsinghua

Cross-lingual summarization is the task of generating a summary in one language (e.g., English) for the given document(s) in a different language (e.g., Chinese). Under the globalization background, this task has attracted increasing attention of the computational linguistics community. Nevertheless, there still remains a lack of comprehensive review for this task. Therefore, we present the first systematic critical review on the datasets, approaches, and challenges in this field. Specifically, we carefully organize existing datasets and approaches according to different construction methods and solution paradigms, respectively. For each type of datasets or approaches, we thoroughly introduce and summarize previous efforts and further compare them with each other to provide deeper analyses. In the end, we also discuss promising directions and offer our thoughts to facilitate future research. This survey is for both beginners and experts in cross-lingual summarization, and we hope it will serve as a starting point as well as a source of new ideas for researchers and engineers interested in this area.

CLFeb 28, 2023
Zero-Shot Cross-Lingual Summarization via Large Language Models

Jiaan Wang, Yunlong Liang, Fandong Meng et al. · tsinghua

Given a document in a source language, cross-lingual summarization (CLS) aims to generate a summary in a different target language. Recently, the emergence of Large Language Models (LLMs), such as GPT-3.5, ChatGPT and GPT-4, has attracted wide attention from the computational linguistics community. However, it is not yet known the performance of LLMs on CLS. In this report, we empirically use various prompts to guide LLMs to perform zero-shot CLS from different paradigms (i.e., end-to-end and pipeline), and provide a preliminary evaluation on the generated summaries. We find that ChatGPT and GPT-4 originally prefer to produce lengthy summaries with detailed information. These two LLMs can further balance informativeness and conciseness with the help of an interactive prompt, significantly improving their CLS performance. Experimental results on three widely-used CLS datasets show that GPT-4 achieves state-of-the-art zero-shot CLS performance, and performs competitively compared with the fine-tuned mBART-50. Moreover, we also find some multi-lingual and bilingual LLMs (i.e., BLOOMZ, ChatGLM-6B, Vicuna-13B and ChatYuan) have limited zero-shot CLS ability. Due to the composite nature of CLS, which requires models to perform summarization and translation simultaneously, accomplishing this task in a zero-shot manner is even a challenge for LLMs. Therefore, we sincerely hope and recommend future LLM research could use CLS as a testbed.

AISep 3, 2022
MMKGR: Multi-hop Multi-modal Knowledge Graph Reasoning

Shangfei Zheng, Weiqing Wang, Jianfeng Qu et al.

Multi-modal knowledge graphs (MKGs) include not only the relation triplets, but also related multi-modal auxiliary data (i.e., texts and images), which enhance the diversity of knowledge. However, the natural incompleteness has significantly hindered the applications of MKGs. To tackle the problem, existing studies employ the embedding-based reasoning models to infer the missing knowledge after fusing the multi-modal features. However, the reasoning performance of these methods is limited due to the following problems: (1) ineffective fusion of multi-modal auxiliary features; (2) lack of complex reasoning ability as well as inability to conduct the multi-hop reasoning which is able to infer more missing knowledge. To overcome these problems, we propose a novel model entitled MMKGR (Multi-hop Multi-modal Knowledge Graph Reasoning). Specifically, the model contains the following two components: (1) a unified gate-attention network which is designed to generate effective multi-modal complementary features through sufficient attention interaction and noise reduction; (2) a complementary feature-aware reinforcement learning method which is proposed to predict missing elements by performing the multi-hop reasoning process, based on the features obtained in component (1). The experimental results demonstrate that MMKGR outperforms the state-of-the-art approaches in the MKG reasoning task.

AIMar 12, 2022
Ensemble Semi-supervised Entity Alignment via Cycle-teaching

Kexuan Xin, Zequn Sun, Wen Hua et al.

Entity alignment is to find identical entities in different knowledge graphs. Although embedding-based entity alignment has recently achieved remarkable progress, training data insufficiency remains a critical challenge. Conventional semi-supervised methods also suffer from the incorrect entity alignment in newly proposed training data. To resolve these issues, we design an iterative cycle-teaching framework for semi-supervised entity alignment. The key idea is to train multiple entity alignment models (called aligners) simultaneously and let each aligner iteratively teach its successor the proposed new entity alignment. We propose a diversity-aware alignment selection method to choose reliable entity alignment for each aligner. We also design a conflict resolution mechanism to resolve the alignment conflict when combining the new alignment of an aligner and that from its teacher. Besides, considering the influence of cycle-teaching order, we elaborately design a strategy to arrange the optimal order that can maximize the overall performance of multiple aligners. The cycle-teaching process can break the limitations of each model's learning capability and reduce the noise in new training data, leading to improved performance. Extensive experiments on benchmark datasets demonstrate the effectiveness of the proposed cycle-teaching framework, which significantly outperforms the state-of-the-art models when the training data is insufficient and the new entity alignment has much noise.

LGAug 23, 2022
Large-scale Entity Alignment via Knowledge Graph Merging, Partitioning and Embedding

Kexuan Xin, Zequn Sun, Wen Hua et al.

Entity alignment is a crucial task in knowledge graph fusion. However, most entity alignment approaches have the scalability problem. Recent methods address this issue by dividing large KGs into small blocks for embedding and alignment learning in each. However, such a partitioning and learning process results in an excessive loss of structure and alignment. Therefore, in this work, we propose a scalable GNN-based entity alignment approach to reduce the structure and alignment loss from three perspectives. First, we propose a centrality-based subgraph generation algorithm to recall some landmark entities serving as the bridges between different subgraphs. Second, we introduce self-supervised entity reconstruction to recover entity representations from incomplete neighborhood subgraphs, and design cross-subgraph negative sampling to incorporate entities from other subgraphs in alignment learning. Third, during the inference process, we merge the embeddings of subgraphs to make a single space for alignment search. Experimental results on the benchmark OpenEA dataset and the proposed large DBpedia1M dataset verify the effectiveness of our approach.

CLDec 1, 2022
Long-Document Cross-Lingual Summarization

Shaohui Zheng, Zhixu Li, Jiaan Wang et al.

Cross-Lingual Summarization (CLS) aims at generating summaries in one language for the given documents in another language. CLS has attracted wide research attention due to its practical significance in the multi-lingual world. Though great contributions have been made, existing CLS works typically focus on short documents, such as news articles, short dialogues and guides. Different from these short texts, long documents such as academic articles and business reports usually discuss complicated subjects and consist of thousands of words, making them non-trivial to process and summarize. To promote CLS research on long documents, we construct Perseus, the first long-document CLS dataset which collects about 94K Chinese scientific documents paired with English summaries. The average length of documents in Perseus is more than two thousand tokens. As a preliminary study on long-document CLS, we build and evaluate various CLS baselines, including pipeline and end-to-end methods. Experimental results on Perseus show the superiority of the end-to-end baseline, outperforming the strong pipeline models equipped with sophisticated machine translation systems. Furthermore, to provide a deeper understanding, we manually analyze the model outputs and discuss specific challenges faced by current approaches. We hope that our work could benchmark long-document CLS and benefit future studies.

CLJul 17, 2022
RT-KGD: Relation Transition Aware Knowledge-Grounded Dialogue Generation

Kexin Wang, Zhixu Li, Jiaan Wang et al.

Grounding dialogue system with external knowledge is a promising way to improve the quality of responses. Most existing works adopt knowledge graphs (KGs) as the external resources, paying attention to the contribution of entities in the last utterance of the dialogue for context understanding and response generation. Nevertheless, the correlations between knowledge implied in the multi-turn context and the transition regularities between relations in KGs are under-explored. To this end, we propose a Relation Transition aware Knowledge-Grounded Dialogue Generation model (RT-KGD). Specifically, inspired by the latent logic of human conversation, our model integrates dialogue-level relation transition regularities with turn-level entity semantic information. In this manner, the interaction between knowledge is considered to produce abundant clues for predicting the appropriate knowledge and generating coherent responses. The experimental results on both automatic evaluation and manual evaluation indicate that our model outperforms state-of-the-art baselines.

CLJun 17, 2023
Snowman: A Million-scale Chinese Commonsense Knowledge Graph Distilled from Foundation Model

Jiaan Wang, Jianfeng Qu, Yunlong Liang et al.

Constructing commonsense knowledge graphs (CKGs) has attracted wide research attention due to its significant importance in cognitive intelligence. Nevertheless, existing CKGs are typically oriented to English, limiting the research in non-English languages. Meanwhile, the emergence of foundation models like ChatGPT and GPT-4 has shown promising intelligence with the help of reinforcement learning from human feedback. Under the background, in this paper, we utilize foundation models to construct a Chinese CKG, named Snowman. Specifically, we distill different types of commonsense head items from ChatGPT, and continue to use it to collect tail items with respect to the head items and pre-defined relations. Based on the preliminary analysis, we find the negative commonsense knowledge distilled by ChatGPT achieves lower human acceptance compared to other knowledge. Therefore, we design a simple yet effective self-instruct filtering strategy to filter out invalid negative commonsense. Overall, the constructed Snowman covers more than ten million Chinese commonsense triples, making it the largest Chinese CKG. Moreover, human studies show the acceptance of Snowman achieves 90.6\%, indicating the high-quality triples distilled by the cutting-edge foundation model. We also conduct experiments on commonsense knowledge models to show the usability and effectiveness of our Snowman.

CLFeb 11, 2022Code
ClidSum: A Benchmark Dataset for Cross-Lingual Dialogue Summarization

Jiaan Wang, Fandong Meng, Ziyao Lu et al.

We present ClidSum, a benchmark dataset for building cross-lingual summarization systems on dialogue documents. It consists of 67k+ dialogue documents from two subsets (i.e., SAMSum and MediaSum) and 112k+ annotated summaries in different target languages. Based on the proposed ClidSum, we introduce two benchmark settings for supervised and semi-supervised scenarios, respectively. We then build various baseline systems in different paradigms (pipeline and end-to-end) and conduct extensive experiments on ClidSum to provide deeper analyses. Furthermore, we propose mDialBART which extends mBART-50 (a multi-lingual BART) via further pre-training. The multiple objectives used in the further pre-training stage help the pre-trained model capture the structural characteristics as well as important content in dialogues and the transformation from source to the target language. Experimental results show the superiority of mDialBART, as an end-to-end model, outperforms strong pipeline models on ClidSum. Finally, we discuss specific challenges that current approaches faced with this task and give multiple promising directions for future research. We have released the dataset and code at https://github.com/krystalan/ClidSum.

SENov 3, 2025
Exploringand Unleashing the Power of Large Language Models in CI/CD Configuration Translation

Chong Wang, Chen Zhang, Jiajun Wu et al.

Continuous Integration (CI) is a cornerstone of modern collaborative software development, and numerous CI platforms are available. Differences in maintenance overhead, reliability, and integration depth with code-hosting platforms make migration between CI platforms a common practice. A central step in migration is translating CI configurations, which is challenging due to the intrinsic complexity of CI configurations and the need to understand semantic differences and relationships across CI platforms. With the advent of large language models (LLMs), recent advances in software engineering highlight their potential for CI configuration translation. In this paper, we present a study on LLM-based CI configuration translation, focusing on the migration from Travis CI to GitHub Actions. First, using 811 migration records, we quantify the effort involved and find that developers read an average of 38 lines of Travis configuration and write 58 lines of GitHub Actions configuration, with nearly half of the migrations requiring multiple commits. We further analyze translations produced by each of the four LLMs and identify 1,121 issues grouped into four categories: logic inconsistencies (38%), platform discrepancies (32%), environment errors (25%), and syntax errors (5%). Finally, we evaluate three enhancement strategies and show that combining guideline-based prompting with iterative refinement achieves the best performance, reaching a Build Success Rate of 75.5%-nearly a threefold improvement over GPT-4o with a basic prompt.

CLJan 9, 2024
Improving the Robustness of Knowledge-Grounded Dialogue via Contrastive Learning

Jiaan Wang, Jianfeng Qu, Kexin Wang et al.

Knowledge-grounded dialogue (KGD) learns to generate an informative response based on a given dialogue context and external knowledge (\emph{e.g.}, knowledge graphs; KGs). Recently, the emergence of large language models (LLMs) and pre-training techniques has brought great success to knowledge-grounded dialogue. However, when building KGD systems in real applications, there are various real-world noises that are inevitable to face. For example, the dialogue context might involve perturbations such as misspellings and abbreviations. In addition, KGs typically suffer from incompletion and also might contain erroneous and outdated facts. Such real-world noises pose a challenge to the robustness of KGD systems and hinder their applications in the real world. In this paper, we propose an entity-based contrastive learning framework for improving the robustness of KGD. Specifically, we make use of the entity information in a KGD sample to create both its positive and negative samples which involve semantic-irrelevant and semantic-relevant perturbations, respectively. The contrastive learning framework ensures the KGD model is aware of these two types of perturbations, thus generating informative responses with the potentially noisy inputs in real applications. Experimental results on three benchmark datasets show that our method achieves new state-of-the-art performance in terms of automatic evaluation scores, verifying its effectiveness and potentiality. Furthermore, we show that our method can generate better responses than comparison models in both the noisy and the few-shot settings.

CLJul 8, 2025
Remember Past, Anticipate Future: Learning Continual Multimodal Misinformation Detectors

Bing Wang, Ximing Li, Mengzhe Ye et al.

Nowadays, misinformation articles, especially multimodal ones, are widely spread on social media platforms and cause serious negative effects. To control their propagation, Multimodal Misinformation Detection (MMD) becomes an active topic in the community to automatically identify misinformation. Previous MMD methods focus on supervising detectors by collecting offline data. However, in real-world scenarios, new events always continually emerge, making MMD models trained on offline data consistently outdated and ineffective. To address this issue, training MMD models under online data streams is an alternative, inducing an emerging task named continual MMD. Unfortunately, it is hindered by two major challenges. First, training on new data consistently decreases the detection performance on past data, named past knowledge forgetting. Second, the social environment constantly evolves over time, affecting the generalization on future data. To alleviate these challenges, we propose to remember past knowledge by isolating interference between event-specific parameters with a Dirichlet process-based mixture-of-expert structure, and anticipate future environmental distributions by learning a continuous-time dynamics model. Accordingly, we induce a new continual MMD method DAEDCMD. Extensive experiments demonstrate that DAEDCMD can consistently and significantly outperform the compared methods, including six MMD baselines and three continual learning methods.

LGJan 25, 2025
PIP: Perturbation-based Iterative Pruning for Large Language Models

Yi Cao, Wei-Jie Xu, Yucheng Shen et al.

The rapid increase in the parameter counts of Large Language Models (LLMs), which often reach into the billions or even trillions, presents significant challenges for their practical deployment, particularly in resource-constrained environments. To address this issue, we propose PIP (Perturbation-based Iterative Pruning), a novel double-view structured pruning method to optimize LLMs, which combines information from two different views: the unperturbed view and the perturbed view. With the calculation of gradient differences, PIP iteratively prunes those that struggle to distinguish between these two views. Our experiments show that PIP reduces the parameter count by approximately 20% while retaining over 85% of the original model's accuracy across varied benchmarks. In some cases, the performance of the pruned model is within 5% of the unpruned version, demonstrating PIP's ability to preserve key aspects of model effectiveness. Moreover, PIP consistently outperforms existing state-of-the-art (SOTA) structured pruning methods, establishing it as a leading technique for optimizing LLMs in constrained environments.

CLMay 16, 2023
Towards Unifying Multi-Lingual and Cross-Lingual Summarization

Jiaan Wang, Fandong Meng, Duo Zheng et al.

To adapt text summarization to the multilingual world, previous work proposes multi-lingual summarization (MLS) and cross-lingual summarization (CLS). However, these two tasks have been studied separately due to the different definitions, which limits the compatible and systematic research on both of them. In this paper, we aim to unify MLS and CLS into a more general setting, i.e., many-to-many summarization (M2MS), where a single model could process documents in any language and generate their summaries also in any language. As the first step towards M2MS, we conduct preliminary studies to show that M2MS can better transfer task knowledge across different languages than MLS and CLS. Furthermore, we propose Pisces, a pre-trained M2MS model that learns language modeling, cross-lingual ability and summarization ability via three-stage pre-training. Experimental results indicate that our Pisces significantly outperforms the state-of-the-art baselines, especially in the zero-shot directions, where there is no training data from the source-language documents to the target-language summaries.

CLJan 29, 2022
Incorporating Commonsense Knowledge into Story Ending Generation via Heterogeneous Graph Networks

Jiaan Wang, Beiqi Zou, Zhixu Li et al.

Story ending generation is an interesting and challenging task, which aims to generate a coherent and reasonable ending given a story context. The key challenges of the task lie in how to comprehend the story context sufficiently and handle the implicit knowledge behind story clues effectively, which are still under-explored by previous work. In this paper, we propose a Story Heterogeneous Graph Network (SHGN) to explicitly model both the information of story context at different granularity levels and the multi-grained interactive relations among them. In detail, we consider commonsense knowledge, words and sentences as three types of nodes. To aggregate non-local information, a global node is also introduced. Given this heterogeneous graph network, the node representations are updated through graph propagation, which adequately utilizes commonsense knowledge to facilitate story comprehension. Moreover, we design two auxiliary tasks to implicitly capture the sentiment trend and key events lie in the context. The auxiliary tasks are jointly optimized with the primary story ending generation task in a multi-task learning strategy. Extensive experiments on the ROCStories Corpus show that the developed model achieves new state-of-the-art performances. Human study further demonstrates that our model generates more reasonable story endings.

CLNov 24, 2021
Knowledge Enhanced Sports Game Summarization

Jiaan Wang, Zhixu Li, Tingyi Zhang et al.

Sports game summarization aims at generating sports news from live commentaries. However, existing datasets are all constructed through automated collection and cleaning processes, resulting in a lot of noise. Besides, current works neglect the knowledge gap between live commentaries and sports news, which limits the performance of sports game summarization. In this paper, we introduce K-SportsSum, a new dataset with two characteristics: (1) K-SportsSum collects a large amount of data from massive games. It has 7,854 commentary-news pairs. To improve the quality, K-SportsSum employs a manual cleaning process; (2) Different from existing datasets, to narrow the knowledge gap, K-SportsSum further provides a large-scale knowledge corpus that contains the information of 523 sports teams and 14,724 sports players. Additionally, we also introduce a knowledge-enhanced summarizer that utilizes both live commentaries and the knowledge to generate sports news. Extensive experiments on K-SportsSum and SportsSum datasets show that our model achieves new state-of-the-art performances. Qualitative analysis and human study further verify that our model generates more informative sports news.

CLOct 12, 2021
SportsSum2.0: Generating High-Quality Sports News from Live Text Commentary

Jiaan Wang, Zhixu Li, Qiang Yang et al.

Sports game summarization aims to generate news articles from live text commentaries. A recent state-of-the-art work, SportsSum, not only constructs a large benchmark dataset, but also proposes a two-step framework. Despite its great contributions, the work has three main drawbacks: 1) the noise existed in SportsSum dataset degrades the summarization performance; 2) the neglect of lexical overlap between news and commentaries results in low-quality pseudo-labeling algorithm; 3) the usage of directly concatenating rewritten sentences to form news limits its practicability. In this paper, we publish a new benchmark dataset SportsSum2.0, together with a modified summarization framework. In particular, to obtain a clean dataset, we employ crowd workers to manually clean the original dataset. Moreover, the degree of lexical overlap is incorporated into the generation of pseudo labels. Further, we introduce a reranker-enhanced summarizer to take into account the fluency and expressiveness of the summarized news. Extensive experiments show that our model outperforms the state-of-the-art baseline.