LGApr 7, 2023
Toward Practical Entity Alignment Method Design: Insights from New Highly Heterogeneous Knowledge Graph DatasetsXuhui Jiang, Chengjin Xu, Yinghan Shen et al.
The flourishing of knowledge graph applications has driven the need for entity alignment (EA) across KGs. However, the heterogeneity of practical KGs, characterized by differing scales, structures, and limited overlapping entities, greatly surpasses that of existing EA datasets. This discrepancy highlights an oversimplified heterogeneity in current EA datasets, which obstructs a full understanding of the advancements achieved by recent EA methods. In this paper, we study the performance of EA methods in practical settings, specifically focusing on the alignment of highly heterogeneous KGs (HHKGs). Firstly, we address the oversimplified heterogeneity settings of current datasets and propose two new HHKG datasets that closely mimic practical EA scenarios. Then, based on these datasets, we conduct extensive experiments to evaluate previous representative EA methods. Our findings reveal that, in aligning HHKGs, valuable structure information can hardly be exploited through message-passing and aggregation mechanisms. This phenomenon leads to inferior performance of existing EA methods, especially those based on GNNs. These findings shed light on the potential problems associated with the conventional application of GNN-based methods as a panacea for all EA datasets. Consequently, in light of these observations and to elucidate what EA methodology is genuinely beneficial in practical scenarios, we undertake an in-depth analysis by implementing a simple but effective approach: Simple-HHEA. This method adaptly integrates entity name, structure, and temporal information to navigate the challenges posed by HHKGs. Our experiment results conclude that the key to the future EA model design in practice lies in their adaptability and efficiency to varying information quality conditions, as well as their capability to capture patterns across HHKGs.
AIOct 7, 2023
On the Evolution of Knowledge Graphs: A Survey and PerspectiveXuhui Jiang, Chengjin Xu, Yinghan Shen et al.
Knowledge graphs (KGs) are structured representations of diversified knowledge. They are widely used in various intelligent applications. In this article, we provide a comprehensive survey on the evolution of various types of knowledge graphs (i.e., static KGs, dynamic KGs, temporal KGs, and event KGs) and techniques for knowledge extraction and reasoning. Furthermore, we introduce the practical applications of different types of KGs, including a case study in financial analysis. Finally, we propose our perspective on the future directions of knowledge engineering, including the potential of combining the power of knowledge graphs and large language models (LLMs), and the evolution of knowledge extraction, reasoning, and representation.
CLNov 23, 2024
A Survey on LLM-as-a-JudgeJiawei Gu, Xuhui Jiang, Zhichao Shi et al.
Accurate and consistent evaluation is crucial for decision-making across numerous fields, yet it remains a challenging task due to inherent subjectivity, variability, and scale. Large Language Models (LLMs) have achieved remarkable success across diverse domains, leading to the emergence of "LLM-as-a-Judge," where LLMs are employed as evaluators for complex tasks. With their ability to process diverse data types and provide scalable, cost-effective, and consistent assessments, LLMs present a compelling alternative to traditional expert-driven evaluations. However, ensuring the reliability of LLM-as-a-Judge systems remains a significant challenge that requires careful design and standardization. This paper provides a comprehensive survey of LLM-as-a-Judge, addressing the core question: How can reliable LLM-as-a-Judge systems be built? We explore strategies to enhance reliability, including improving consistency, mitigating biases, and adapting to diverse assessment scenarios. Additionally, we propose methodologies for evaluating the reliability of LLM-as-a-Judge systems, supported by a novel benchmark designed for this purpose. To advance the development and real-world deployment of LLM-as-a-Judge systems, we also discussed practical applications, challenges, and future directions. This survey serves as a foundational reference for researchers and practitioners in this rapidly evolving field.
CLSep 2, 2025Code
JudgeAgent: Knowledge-wise and Dynamic LLM Evaluation with Agent-as-InterviewerZhichao Shi, Xuhui Jiang, Chengjin Xu et al.
Current evaluation paradigms for large language models (LLMs) suffer from overestimated or biased evaluations and mismatched question difficulty, leading to incomplete evaluations of knowledge and capability boundaries, which hinder their effective application and optimization. To address these challenges, we propose Agent-as-Interviewer, a dynamic evaluation paradigm that employs LLM agents to conduct multi-turn interactions for evaluation. Unlike current benchmarking or dynamic interaction paradigms, Agent-as-Interviewer utilizes agents to invoke knowledge tools for wider and deeper knowledge in the dynamic multi-turn question generation, achieving more comprehensive evaluations of LLM's knowledge boundaries. It also leverages agents to plan query strategies for adjustment of the question difficulty levels, enhancing the difficulty control to match the actual capabilities of target LLMs. Based on this paradigm, we develop JudgeAgent, a knowledge-wise dynamic evaluation framework that employs knowledge-driven synthesis as the agent's tool and uses difficulty scoring as strategy guidance, thereby finally providing valuable suggestions to help targets optimize themselves. Extensive experiments validate the effectiveness of JudgeAgent's suggestions, demonstrating that Agent-as-Interviewer can accurately identify the knowledge and capability boundaries of target models. The source code is available on https://github.com/DataArcTech/JudgeAgent.
CLFeb 23, 2024
Unlocking the Power of Large Language Models for Entity AlignmentXuhui Jiang, Yinghan Shen, Zhichao Shi et al.
Entity Alignment (EA) is vital for integrating diverse knowledge graph (KG) data, playing a crucial role in data-driven AI applications. Traditional EA methods primarily rely on comparing entity embeddings, but their effectiveness is constrained by the limited input KG data and the capabilities of the representation learning techniques. Against this backdrop, we introduce ChatEA, an innovative framework that incorporates large language models (LLMs) to improve EA. To address the constraints of limited input KG data, ChatEA introduces a KG-code translation module that translates KG structures into a format understandable by LLMs, thereby allowing LLMs to utilize their extensive background knowledge to improve EA accuracy. To overcome the over-reliance on entity embedding comparisons, ChatEA implements a two-stage EA strategy that capitalizes on LLMs' capability for multi-step reasoning in a dialogue format, thereby enhancing accuracy while preserving efficiency. Our experimental results verify ChatEA's superior performance, highlighting LLMs' potential in facilitating EA tasks.
CLFeb 8, 2025
Related Knowledge Perturbation Matters: Rethinking Multiple Pieces of Knowledge Editing in Same-SubjectZenghao Duan, Wenbin Duan, Zhiyi Yin et al.
Knowledge editing has become a promising approach for efficiently and precisely updating knowledge embedded in large language models (LLMs). In this work, we focus on Same-Subject Editing, which involves modifying multiple attributes of a single entity to ensure comprehensive and consistent updates to entity-centric knowledge. Through preliminary observation, we identify a significant challenge: Current state-of-the-art editing methods struggle when tasked with editing multiple related knowledge pieces for the same subject. To address the lack of relevant editing data for identical subjects in traditional benchmarks, we introduce the $\text{S}^2\text{RKE}$(Same-Subject Related Knowledge Editing) benchmark. Our extensive experiments reveal that only mainstream locate-then-edit methods, such as ROME and MEMIT, exhibit "related knowledge perturbation," where subsequent edits interfere with earlier ones. Further analysis reveals that these methods over-rely on subject information, neglecting other critical factors, resulting in reduced editing effectiveness.