Yajing Xu

CL
h-index21
16papers
553citations
Novelty49%
AI Score43

16 Papers

CVAug 25, 2022Code
Learning to Construct 3D Building Wireframes from 3D Line Clouds

Yicheng Luo, Jing Ren, Xuefei Zhe et al.

Line clouds, though under-investigated in the previous work, potentially encode more compact structural information of buildings than point clouds extracted from multi-view images. In this work, we propose the first network to process line clouds for building wireframe abstraction. The network takes a line cloud as input , i.e., a nonstructural and unordered set of 3D line segments extracted from multi-view images, and outputs a 3D wireframe of the underlying building, which consists of a sparse set of 3D junctions connected by line segments. We observe that a line patch, i.e., a group of neighboring line segments, encodes sufficient contour information to predict the existence and even the 3D position of a potential junction, as well as the likelihood of connectivity between two query junctions. We therefore introduce a two-layer Line-Patch Transformer to extract junctions and connectivities from sampled line patches to form a 3D building wireframe model. We also introduce a synthetic dataset of multi-view images with ground-truth 3D wireframe. We extensively justify that our reconstructed 3D wireframe models significantly improve upon multiple baseline building reconstruction methods. The code and data can be found at https://github.com/Luo1Cheng/LC2WF.

AIJun 8, 2022Code
Disentangled Ontology Embedding for Zero-shot Learning

Yuxia Geng, Jiaoyan Chen, Wen Zhang et al.

Knowledge Graph (KG) and its variant of ontology have been widely used for knowledge representation, and have shown to be quite effective in augmenting Zero-shot Learning (ZSL). However, existing ZSL methods that utilize KGs all neglect the intrinsic complexity of inter-class relationships represented in KGs. One typical feature is that a class is often related to other classes in different semantic aspects. In this paper, we focus on ontologies for augmenting ZSL, and propose to learn disentangled ontology embeddings guided by ontology properties to capture and utilize more fine-grained class relationships in different aspects. We also contribute a new ZSL framework named DOZSL, which contains two new ZSL solutions based on generative models and graph propagation models, respectively, for effectively utilizing the disentangled ontology embeddings. Extensive evaluations have been conducted on five benchmarks across zero-shot image classification (ZS-IMGC) and zero-shot KG completion (ZS-KGC). DOZSL often achieves better performance than the state-of-the-art, and its components have been verified by ablation studies and case studies. Our codes and datasets are available at https://github.com/zjukg/DOZSL.

CLOct 10, 2023Code
Making Large Language Models Perform Better in Knowledge Graph Completion

Yichi Zhang, Zhuo Chen, Lingbing Guo et al.

Large language model (LLM) based knowledge graph completion (KGC) aims to predict the missing triples in the KGs with LLMs. However, research about LLM-based KGC fails to sufficiently harness LLMs' inference proficiencies, overlooking critical structural information integral to KGs. In this paper, we explore methods to incorporate structural information into the LLMs, with the overarching goal of facilitating structure-aware reasoning. We first discuss on the existing LLM paradigms like in-context learning and instruction tuning, proposing basic structural information injection approaches. Then we propose a Knowledge Prefix Adapter (KoPA) to fulfill this stated goal. The KoPA uses a structural pre-training phase to comprehend the intricate entities and relations within KGs, representing them as structural embeddings. Then KoPA communicates such cross-modal structural information understanding to the LLMs through a knowledge prefix adapter which projects the structural embeddings into the textual space and obtains virtual knowledge tokens positioned as a prefix of the input prompt. We conduct comprehensive experiments and provide incisive analysis concerning how the introduction of cross-modal structural information would be better for LLM's factual knowledge reasoning ability. Our code and data are available at https://github.com/zjukg/KoPA .

LGMar 3, 2023Code
Structure Pretraining and Prompt Tuning for Knowledge Graph Transfer

Wen Zhang, Yushan Zhu, Mingyang Chen et al.

Knowledge graphs (KG) are essential background knowledge providers in many tasks. When designing models for KG-related tasks, one of the key tasks is to devise the Knowledge Representation and Fusion (KRF) module that learns the representation of elements from KGs and fuses them with task representations. While due to the difference of KGs and perspectives to be considered during fusion across tasks, duplicate and ad hoc KRF modules design are conducted among tasks. In this paper, we propose a novel knowledge graph pretraining model KGTransformer that could serve as a uniform KRF module in diverse KG-related tasks. We pretrain KGTransformer with three self-supervised tasks with sampled sub-graphs as input. For utilization, we propose a general prompt-tuning mechanism regarding task data as a triple prompt to allow flexible interactions between task KGs and task data. We evaluate pretrained KGTransformer on three tasks, triple classification, zero-shot image classification, and question answering. KGTransformer consistently achieves better results than specifically designed task models. Through experiments, we justify that the pretrained KGTransformer could be used off the shelf as a general and effective KRF module across KG-related tasks. The code and datasets are available at https://github.com/zjukg/KGTransformer.

MMMar 28, 2024Code
NativE: Multi-modal Knowledge Graph Completion in the Wild

Yichi Zhang, Zhuo Chen, Lingbing Guo et al.

Multi-modal knowledge graph completion (MMKGC) aims to automatically discover the unobserved factual knowledge from a given multi-modal knowledge graph by collaboratively modeling the triple structure and multi-modal information from entities. However, real-world MMKGs present challenges due to their diverse and imbalanced nature, which means that the modality information can span various types (e.g., image, text, numeric, audio, video) but its distribution among entities is uneven, leading to missing modalities for certain entities. Existing works usually focus on common modalities like image and text while neglecting the imbalanced distribution phenomenon of modal information. To address these issues, we propose a comprehensive framework NativE to achieve MMKGC in the wild. NativE proposes a relation-guided dual adaptive fusion module that enables adaptive fusion for any modalities and employs a collaborative modality adversarial training framework to augment the imbalanced modality information. We construct a new benchmark called WildKGC with five datasets to evaluate our method. The empirical results compared with 21 recent baselines confirm the superiority of our method, consistently achieving state-of-the-art performance across different datasets and various scenarios while keeping efficient and generalizable. Our code and data are released at https://github.com/zjukg/NATIVE

LGJul 3, 2024
SF-GNN: Self Filter for Message Lossless Propagation in Deep Graph Neural Network

Yushan Zhu, Wen Zhang, Yajing Xu et al.

Graph Neural Network (GNN), with the main idea of encoding graph structure information of graphs by propagation and aggregation, has developed rapidly. It achieved excellent performance in representation learning of multiple types of graphs such as homogeneous graphs, heterogeneous graphs, and more complex graphs like knowledge graphs. However, merely stacking GNN layers may not improve the model's performance and can even be detrimental. For the phenomenon of performance degradation in deep GNNs, we propose a new perspective. Unlike the popular explanations of over-smoothing or over-squashing, we think the issue arises from the interference of low-quality node representations during message propagation. We introduce a simple and general method, SF-GNN, to address this problem. In SF-GNN, we define two representations for each node, one is the node representation that represents the feature of the node itself, and the other is the message representation specifically for propagating messages to neighbor nodes. A self-filter module evaluates the quality of the node representation and decides whether to integrate it into the message propagation based on this quality assessment. Experiments on node classification tasks for both homogeneous and heterogeneous graphs, as well as link prediction tasks on knowledge graphs, demonstrate that our method can be applied to various GNN models and outperforms state-of-the-art baseline methods in addressing deep GNN degradation.

AIApr 15, 2024Code
Tokenization, Fusion, and Augmentation: Towards Fine-grained Multi-modal Entity Representation

Yichi Zhang, Zhuo Chen, Lingbing Guo et al.

Multi-modal knowledge graph completion (MMKGC) aims to discover unobserved knowledge from given knowledge graphs, collaboratively leveraging structural information from the triples and multi-modal information of the entities to overcome the inherent incompleteness. Existing MMKGC methods usually extract multi-modal features with pre-trained models, resulting in coarse handling of multi-modal entity information, overlooking the nuanced, fine-grained semantic details and their complex interactions. To tackle this shortfall, we introduce a novel framework MyGO to tokenize, fuse, and augment the fine-grained multi-modal representations of entities and enhance the MMKGC performance. Motivated by the tokenization technology, MyGO tokenizes multi-modal entity information as fine-grained discrete tokens and learns entity representations with a cross-modal entity encoder. To further augment the multi-modal representations, MyGO incorporates fine-grained contrastive learning to highlight the specificity of the entity representations. Experiments on standard MMKGC benchmarks reveal that our method surpasses 19 of the latest models, underlining its superior performance. Code and data can be found in https://github.com/zjukg/MyGO

CLMar 8, 2023
Query-Utterance Attention with Joint modeling for Query-Focused Meeting Summarization

Xingxian Liu, Bin Duan, Bo Xiao et al.

Query-focused meeting summarization (QFMS) aims to generate summaries from meeting transcripts in response to a given query. Previous works typically concatenate the query with meeting transcripts and implicitly model the query relevance only at the token level with attention mechanism. However, due to the dilution of key query-relevant information caused by long meeting transcripts, the original transformer-based model is insufficient to highlight the key parts related to the query. In this paper, we propose a query-aware framework with joint modeling token and utterance based on Query-Utterance Attention. It calculates the utterance-level relevance to the query with a dense retrieval module. Then both token-level query relevance and utterance-level query relevance are combined and incorporated into the generation process with attention mechanism explicitly. We show that the query relevance of different granularities contributes to generating a summary more related to the query. Experimental results on the QMSum dataset show that the proposed model achieves new state-of-the-art performance.

CVJun 2, 2025Code
Abstractive Visual Understanding of Multi-modal Structured Knowledge: A New Perspective for MLLM Evaluation

Yichi Zhang, Zhuo Chen, Lingbing Guo et al.

Multi-modal large language models (MLLMs) incorporate heterogeneous modalities into LLMs, enabling a comprehensive understanding of diverse scenarios and objects. Despite the proliferation of evaluation benchmarks and leaderboards for MLLMs, they predominantly overlook the critical capacity of MLLMs to comprehend world knowledge with structured abstractions that appear in visual form. To address this gap, we propose a novel evaluation paradigm and devise M3STR, an innovative benchmark grounded in the Multi-Modal Map for STRuctured understanding. This benchmark leverages multi-modal knowledge graphs to synthesize images encapsulating subgraph architectures enriched with multi-modal entities. M3STR necessitates that MLLMs not only recognize the multi-modal entities within the visual inputs but also decipher intricate relational topologies among them. We delineate the benchmark's statistical profiles and automated construction pipeline, accompanied by an extensive empirical analysis of 26 state-of-the-art MLLMs. Our findings reveal persistent deficiencies in processing abstractive visual information with structured knowledge, thereby charting a pivotal trajectory for advancing MLLMs' holistic reasoning capacities. Our code and data are released at https://github.com/zjukg/M3STR

AIJun 26, 2024Code
Start from Zero: Triple Set Prediction for Automatic Knowledge Graph Completion

Wen Zhang, Yajing Xu, Peng Ye et al.

Knowledge graph (KG) completion aims to find out missing triples in a KG. Some tasks, such as link prediction and instance completion, have been proposed for KG completion. They are triple-level tasks with some elements in a missing triple given to predict the missing element of the triple. However, knowing some elements of the missing triple in advance is not always a realistic setting. In this paper, we propose a novel graph-level automatic KG completion task called Triple Set Prediction (TSP) which assumes none of the elements in the missing triples is given. TSP is to predict a set of missing triples given a set of known triples. To properly and accurately evaluate this new task, we propose 4 evaluation metrics including 3 classification metrics and 1 ranking metric, considering both the partial-open-world and the closed-world assumptions. Furthermore, to tackle the huge candidate triples for prediction, we propose a novel and efficient subgraph-based method GPHT that can predict the triple set fast. To fairly compare the TSP results, we also propose two types of methods RuleTensor-TSP and KGE-TSP applying the existing rule- and embedding-based methods for TSP as baselines. During experiments, we evaluate the proposed methods on two datasets extracted from Wikidata following the relation-similarity partial-open-world assumption proposed by us, and also create a complete family data set to evaluate TSP results following the closed-world assumption. Results prove that the methods can successfully generate a set of missing triples and achieve reasonable scores on the new task, and GPHT performs better than the baselines with significantly shorter prediction time. The datasets and code for experiments are available at https://github.com/zjukg/GPHT-for-TSP.

LGFeb 25, 2022Code
NeuralKG: An Open Source Library for Diverse Representation Learning of Knowledge Graphs

Wen Zhang, Xiangnan Chen, Zhen Yao et al.

NeuralKG is an open-source Python-based library for diverse representation learning of knowledge graphs. It implements three different series of Knowledge Graph Embedding (KGE) methods, including conventional KGEs, GNN-based KGEs, and Rule-based KGEs. With a unified framework, NeuralKG successfully reproduces link prediction results of these methods on benchmarks, freeing users from the laborious task of reimplementing them, especially for some methods originally written in non-python programming languages. Besides, NeuralKG is highly configurable and extensible. It provides various decoupled modules that can be mixed and adapted to each other. Thus with NeuralKG, developers and researchers can quickly implement their own designed models and obtain the optimal training methods to achieve the best performance efficiently. We built an website in http://neuralkg.zjukg.cn to organize an open and shared KG representation learning community. The source code is all publicly released at https://github.com/zjukg/NeuralKG.

CLDec 31, 2024
Have We Designed Generalizable Structural Knowledge Promptings? Systematic Evaluation and Rethinking

Yichi Zhang, Zhuo Chen, Lingbing Guo et al.

Large language models (LLMs) have demonstrated exceptional performance in text generation within current NLP research. However, the lack of factual accuracy is still a dark cloud hanging over the LLM skyscraper. Structural knowledge prompting (SKP) is a prominent paradigm to integrate external knowledge into LLMs by incorporating structural representations, achieving state-of-the-art results in many knowledge-intensive tasks. However, existing methods often focus on specific problems, lacking a comprehensive exploration of the generalization and capability boundaries of SKP. This paper aims to evaluate and rethink the generalization capability of the SKP paradigm from four perspectives including Granularity, Transferability, Scalability, and Universality. To provide a thorough evaluation, we introduce a novel multi-granular, multi-level benchmark called SUBARU, consisting of 9 different tasks with varying levels of granularity and difficulty.

AIApr 18, 2025
Multi-modal Knowledge Graph Generation with Semantics-enriched Prompts

Yajing Xu, Zhiqiang Liu, Jiaoyan Chen et al.

Multi-modal Knowledge Graphs (MMKGs) have been widely applied across various domains for knowledge representation. However, the existing MMKGs are significantly fewer than required, and their construction faces numerous challenges, particularly in ensuring the selection of high-quality, contextually relevant images for knowledge graph enrichment. To address these challenges, we present a framework for constructing MMKGs from conventional KGs. Furthermore, to generate higher-quality images that are more relevant to the context in the given knowledge graph, we designed a neighbor selection method called Visualizable Structural Neighbor Selection (VSNS). This method consists of two modules: Visualizable Neighbor Selection (VNS) and Structural Neighbor Selection (SNS). The VNS module filters relations that are difficult to visualize, while the SNS module selects neighbors that most effectively capture the structural characteristics of the entity. To evaluate the quality of the generated images, we performed qualitative and quantitative evaluations on two datasets, MKG-Y and DB15K. The experimental results indicate that using the VSNS method to select neighbors results in higher-quality images that are more relevant to the knowledge graph.

CLDec 24, 2024
Is Large Language Model Good at Triple Set Prediction? An Empirical Study

Yuan Yuan, Yajing Xu, Wen Zhang

The core of the Knowledge Graph Completion (KGC) task is to predict and complete the missing relations or nodes in a KG. Common KGC tasks are mostly about inferring unknown elements with one or two elements being known in a triple. In comparison, the Triple Set Prediction (TSP) task is a more realistic knowledge graph completion task. It aims to predict all elements of unknown triples based on the information from known triples. In recent years, large language models (LLMs) have exhibited significant advancements in language comprehension, demonstrating considerable potential for KGC tasks. However, the potential of LLM on the TSP task has not yet to be investigated. Thus in this paper we proposed a new framework to explore the strengths and limitations of LLM in the TSP task. Specifically, the framework consists of LLM-based rule mining and LLM-based triple set prediction. The relation list of KG embedded within rich semantic information is first leveraged to prompt LLM in the generation of rules. This process is both efficient and independent of statistical information, making it easier to mine effective and realistic rules. For each subgraph, the specified rule is applied in conjunction with the relevant triples within that subgraph to guide the LLM in predicting the missing triples. Subsequently, the predictions from all subgraphs are consolidated to derive the complete set of predicted triples on KG. Finally, the method is evaluated on the relatively complete CFamily dataset. The experimental results indicate that when LLMs are required to adhere to a large amount of factual knowledge to predict missing triples, significant hallucinations occurs, leading to a noticeable decline in performance. To further explore the causes of this phenomenon, this paper presents a comprehensive analysis supported by a detailed case study.

CLMay 22, 2023
Learning to Rank Utterances for Query-Focused Meeting Summarization

Xingxian Liu, Yajing Xu

Query-focused meeting summarization(QFMS) aims to generate a specific summary for the given query according to the meeting transcripts. Due to the conflict between long meetings and limited input size, previous works mainly adopt extract-then-summarize methods, which use extractors to simulate binary labels or ROUGE scores to extract utterances related to the query and then generate a summary. However, the previous approach fails to fully use the comparison between utterances. To the extractor, comparison orders are more important than specific scores. In this paper, we propose a Ranker-Generator framework. It learns to rank the utterances by comparing them in pairs and learning from the global orders, then uses top utterances as the generator's input. We show that learning to rank utterances helps to select utterances related to the query effectively, and the summarizer can benefit from it. Experimental results on QMSum show that the proposed model outperforms all existing multi-stage models with fewer parameters.

CVMay 11, 2019
Cyclone intensity estimate with context-aware cyclegan

Yajing Xu, Haitao Yang, Mingfei Cheng et al.

Deep learning approaches to cyclone intensity estimationhave recently shown promising results. However, sufferingfrom the extreme scarcity of cyclone data on specific in-tensity, most existing deep learning methods fail to achievesatisfactory performance on cyclone intensity estimation,especially on classes with few instances. To avoid the degra-dation of recognition performance caused by scarce samples,we propose a context-aware CycleGAN which learns the la-tent evolution features from adjacent cyclone intensity andsynthesizes CNN features of classes lacking samples fromunpaired source classes. Specifically, our approach synthe-sizes features conditioned on the learned evolution features,while the extra information is not required. Experimentalresults of several evaluation methods show the effectivenessof our approach, even can predicting unseen classes.