Xiaofang Zhou

LG
h-index25
62papers
1,127citations
Novelty50%
AI Score59

62 Papers

CRJun 3
Preserving Data Privacy in Learning Causal Structure with Fully Homomorphic Encryption

Jian Yang, Yuan Tong, Qinbin Li et al.

Preserving data privacy is an important topic in structural data management and data mining. However, the issue of privacy leakage in distributed causal structure learning is a persistent challenge, especially in cases where data transmission and computation are required. In this paper, we propose a method based on fully homomorphic encryption (FHE) that performs calculations on ciphertexts, keeping data encrypted in transition and computation. Nevertheless, adopting FHE to causal structure learning is challenging due to the high computation cost and limited support on division as well as logarithm operations in FHE. To tackle this challenge, we propose a series of novel techniques including (i) circuit simplification for better efficiency, (ii) approximation of division and logarithm through Newton-Raphson Reciprocal and Taylor expansion, and (iii) a batching technique with SIMD-acceleration to enhance the whole learning process. Additionally, our method can be easily extended beyond FHE by demonstration of its portability to support differential privacy. Empirical results show that our method achieves high consistency and comparable causal structure with the plaintext version in the datasets tested. Last, our method is efficient and practical to complete learning causal structures in tens of minutes even under the privacy protection of FHE.

DBApr 21, 2022
A Learned Index for Exact Similarity Search in Metric Spaces

Yao Tian, Tingyun Yan, Xi Zhao et al.

Indexing is an effective way to support efficient query processing in large databases. Recently the concept of learned index, which replaces or complements traditional index structures with machine learning models, has been actively explored to reduce storage and search costs. However, accurate and efficient similarity query processing in high-dimensional metric spaces remains to be an open challenge. In this paper, we propose a novel indexing approach called LIMS that uses data clustering, pivot-based data transformation techniques and learned indexes to support efficient similarity query processing in metric spaces. In LIMS, the underlying data is partitioned into clusters such that each cluster follows a relatively uniform data distribution. Data redistribution is achieved by utilizing a small number of pivots for each cluster. Similar data are mapped into compact regions and the mapped values are totally ordinal. Machine learning models are developed to approximate the position of each data record on disk. Efficient algorithms are designed for processing range queries and nearest neighbor queries based on LIMS, and for index maintenance with dynamic updates. Extensive experiments on real-world and synthetic datasets demonstrate the superiority of LIMS compared with traditional indexes and state-of-the-art learned indexes.

LGJul 1, 2022
Time-aware Dynamic Graph Embedding for Asynchronous Structural Evolution

Yu Yang, Hongzhi Yin, Jiannong Cao et al.

Dynamic graphs refer to graphs whose structure dynamically changes over time. Despite the benefits of learning vertex representations (i.e., embeddings) for dynamic graphs, existing works merely view a dynamic graph as a sequence of changes within the vertex connections, neglecting the crucial asynchronous nature of such dynamics where the evolution of each local structure starts at different times and lasts for various durations. To maintain asynchronous structural evolutions within the graph, we innovatively formulate dynamic graphs as temporal edge sequences associated with joining time of vertices (ToV) and timespan of edges (ToE). Then, a time-aware Transformer is proposed to embed vertices' dynamic connections and ToEs into the learned vertex representations. Meanwhile, we treat each edge sequence as a whole and embed its ToV of the first vertex to further encode the time-sensitive information. Extensive evaluations on several datasets show that our approach outperforms the state-of-the-art in a wide range of graph mining tasks. At the same time, it is very efficient and scalable for embedding large-scale dynamic graphs.

AIMar 12, 2022
Ensemble Semi-supervised Entity Alignment via Cycle-teaching

Kexuan Xin, Zequn Sun, Wen Hua et al.

Entity alignment is to find identical entities in different knowledge graphs. Although embedding-based entity alignment has recently achieved remarkable progress, training data insufficiency remains a critical challenge. Conventional semi-supervised methods also suffer from the incorrect entity alignment in newly proposed training data. To resolve these issues, we design an iterative cycle-teaching framework for semi-supervised entity alignment. The key idea is to train multiple entity alignment models (called aligners) simultaneously and let each aligner iteratively teach its successor the proposed new entity alignment. We propose a diversity-aware alignment selection method to choose reliable entity alignment for each aligner. We also design a conflict resolution mechanism to resolve the alignment conflict when combining the new alignment of an aligner and that from its teacher. Besides, considering the influence of cycle-teaching order, we elaborately design a strategy to arrange the optimal order that can maximize the overall performance of multiple aligners. The cycle-teaching process can break the limitations of each model's learning capability and reduce the noise in new training data, leading to improved performance. Extensive experiments on benchmark datasets demonstrate the effectiveness of the proposed cycle-teaching framework, which significantly outperforms the state-of-the-art models when the training data is insufficient and the new entity alignment has much noise.

LGJul 11, 2022
Few-Shot Semantic Relation Prediction across Heterogeneous Graphs

Pengfei Ding, Yan Wang, Guanfeng Liu et al.

Semantic relation prediction aims to mine the implicit relationships between objects in heterogeneous graphs, which consist of different types of objects and different types of links. In real-world scenarios, new semantic relations constantly emerge and they typically appear with only a few labeled data. Since a variety of semantic relations exist in multiple heterogeneous graphs, the transferable knowledge can be mined from some existing semantic relations to help predict the new semantic relations with few labeled data. This inspires a novel problem of few-shot semantic relation prediction across heterogeneous graphs. However, the existing methods cannot solve this problem because they not only require a large number of labeled samples as input, but also focus on a single graph with a fixed heterogeneity. Targeting this novel and challenging problem, in this paper, we propose a Meta-learning based Graph neural network for Semantic relation prediction, named MetaGS. Firstly, MetaGS decomposes the graph structure between objects into multiple normalized subgraphs, then adopts a two-view graph neural network to capture local heterogeneous information and global structure information of these subgraphs. Secondly, MetaGS aggregates the information of these subgraphs with a hyper-prototypical network, which can learn from existing semantic relations and adapt to new semantic relations. Thirdly, using the well-initialized two-view graph neural network and hyper-prototypical network, MetaGS can effectively learn new semantic relations from different graphs while overcoming the limitation of few labeled data. Extensive experiments on three real-world datasets have demonstrated the superior performance of MetaGS over the state-of-the-art methods.

LGAug 23, 2022
Large-scale Entity Alignment via Knowledge Graph Merging, Partitioning and Embedding

Kexuan Xin, Zequn Sun, Wen Hua et al.

Entity alignment is a crucial task in knowledge graph fusion. However, most entity alignment approaches have the scalability problem. Recent methods address this issue by dividing large KGs into small blocks for embedding and alignment learning in each. However, such a partitioning and learning process results in an excessive loss of structure and alignment. Therefore, in this work, we propose a scalable GNN-based entity alignment approach to reduce the structure and alignment loss from three perspectives. First, we propose a centrality-based subgraph generation algorithm to recall some landmark entities serving as the bridges between different subgraphs. Second, we introduce self-supervised entity reconstruction to recover entity representations from incomplete neighborhood subgraphs, and design cross-subgraph negative sampling to incorporate entities from other subgraphs in alignment learning. Third, during the inference process, we merge the embeddings of subgraphs to make a single space for alignment search. Experimental results on the benchmark OpenEA dataset and the proposed large DBpedia1M dataset verify the effectiveness of our approach.

DBMay 14
Toward Temporal Attribution Analytics in Dataflows

Chrysanthi Kosyfaki, Ruiyuan Zhang, Nikos Mamoulis et al.

Data provenance (the process of determining the origin and derivation of data outputs) has applications across multiple domains including explaining database query results and auditing scientific workflows. Despite decades of research, provenance tracing remains challenging due to its high computational cost and storage requirements. In streaming systems such as Apache Flink, fine-grained provenance graphs can grow super-linearly with data volume, posing significant scalability challenges. We define temporal attribution, a new lightweight form of provenance, appropriate for certain tasks, such as monitoring dependencies between system components over time quantitatively. Temporal attribution enables time-focused analysis that does not require fine-grained, tuple-level dependency meta-data. Inspired by volume-based provenance tracking in Temporal Interaction Networks (TINs), we demonstrate TINs' applicability in succinctly modeling quantified data exchanges between dataflow operators in stream data processing systems and in processing workflows, in general, over time. We classify data into discrete and liquid types, define five temporal provenance query types, and propose a state-based indexing approach. Our vision outlines research directions toward making this new form of temporal attribution a practical tool for large-scale dataflow analytics.

CVMay 27
SSR3D-LLM: Structured Spatial Reasoning via Latent Steps for Fine-Grained Grounding in Unified 3D-LLMs

Jiawei Li, Ziyi Liu, Weijie Shi et al.

3D object grounding localizes referred objects in a 3D scene from natural language. Unified instance-centric 3D-LLMs aim to solve grounding together with dialog, QA, and captioning, yet many rely on a single pointer-style grounding decision that compresses a relational instruction into one selection. This is brittle for fine-grained queries where multiple same-class candidates must be ruled out by context objects and spatial relations. We propose Structured Spatial Reasoning 3D-LLM (SSR3D-LLM), a structured grounding interface for unified 3D-LLMs. Given fixed Mask3D object proposals, the LLM writes a sequence of latent spatial reasoning steps and memory tokens from the query, and a geometry-aware scorer reads these latent steps in order to refine candidate rankings step by step with step-length masking. The latent steps are learned from standard benchmark target supervision with auxiliary referential-cue supervision during training, while inference uses only the input query and Mask3D proposals. Across ReferIt3D, ScanRefer, and Multi3DRef, SSR3D-LLM achieves the strongest results among unified 3D-LLM baselines, with substantial gains over the single-pointer QPG baseline on fine-grained grounding and consistent improvements over prior unified 3D-LLMs, while preserving the default language-task route.

LGJan 7Code
R$^3$L: Reflect-then-Retry Reinforcement Learning with Language-Guided Exploration, Pivotal Credit, and Positive Amplification

Weijie Shi, Yanxi Chen, Zexi Li et al.

Reinforcement learning drives recent advances in LLM reasoning and agentic capabilities, yet current approaches struggle with both exploration and exploitation. Exploration suffers from low success rates on difficult tasks and high costs of repeated rollouts from scratch. Exploitation suffers from coarse credit assignment and training instability: Trajectory-level rewards penalize valid prefixes for later errors, and failure-dominated groups overwhelm the few positive signals, leaving optimization without constructive direction. To this end, we propose R$^3$L, Reflect-then-Retry Reinforcement Learning with Language-Guided Exploration, Pivotal Credit, and Positive Amplification. To synthesize high-quality trajectories, R$^3$L shifts from stochastic sampling to active synthesis via reflect-then-retry, leveraging language feedback to diagnose errors, transform failed attempts into successful ones, and reduce rollout costs by restarting from identified failure points. With errors diagnosed and localized, Pivotal Credit Assignment updates only the diverging suffix where contrastive signals exist, excluding the shared prefix from gradient update. Since failures dominate on difficult tasks and reflect-then-retry produces off-policy data, risking training instability, Positive Amplification upweights successful trajectories to ensure positive signals guide the optimization process. Experiments on agentic and reasoning tasks demonstrate 5\% to 52\% relative improvements over baselines while maintaining training stability. Our code is released at https://github.com/shiweijiezero/R3L.

DBApr 28
VisualNeo: Bridging the Gap between Visual Query Interfaces and Graph Query Engines

Kai Huang, Houdong Liang, Chongchong Yao et al.

Visual Graph Query Interfaces (VQIs) empower non-programmers to query graph data by constructing visual queries intuitively. Devising efficient technologies in Graph Query Engines (GQEs) for interactive search and exploration has also been studied for years. However, these two vibrant scientific fields are traditionally independent of each other, causing a vast barrier for users who wish to explore the full-stack operations of graph querying. In this demonstration, we propose a novel VQI system built upon Neo4j called VisualNeo that facilities an efficient subgraph query in large graph databases. VisualNeo inherits several advanced features from recent advanced VQIs, which include the data-driven gui design and canned pattern generation. Additionally, it embodies a database manager module in order that users can connect to generic Neo4j databases. It performs query processing through the Neo4j driver and provides an aesthetic query result exploration.

CLFeb 2Code
Breaking the Static Graph: Context-Aware Traversal for Robust Retrieval-Augmented Generation

Kwun Hang Lau, Fangyuan Zhang, Boyu Ruan et al.

Recent advances in Retrieval-Augmented Generation (RAG) have shifted from simple vector similarity to structure-aware approaches like HippoRAG, which leverage Knowledge Graphs (KGs) and Personalized PageRank (PPR) to capture multi-hop dependencies. However, these methods suffer from a "Static Graph Fallacy": they rely on fixed transition probabilities determined during indexing. This rigidity ignores the query-dependent nature of edge relevance, causing semantic drift where random walks are diverted into high-degree "hub" nodes before reaching critical downstream evidence. Consequently, models often achieve high partial recall but fail to retrieve the complete evidence chain required for multi-hop queries. To address this, we propose CatRAG, Context-Aware Traversal for robust RAG, a framework that builds on the HippoRAG 2 architecture and transforms the static KG into a query-adaptive navigation structure. We introduce a multi-faceted framework to steer the random walk: (1) Symbolic Anchoring, which injects weak entity constraints to regularize the random walk; (2) Query-Aware Dynamic Edge Weighting, which dynamically modulates graph structure, to prune irrelevant paths while amplifying those aligned with the query's intent; and (3) Key-Fact Passage Weight Enhancement, a cost-efficient bias that structurally anchors the random walk to likely evidence. Experiments across four multi-hop benchmarks demonstrate that CatRAG consistently outperforms state of the art baselines. Our analysis reveals that while standard Recall metrics show modest gains, CatRAG achieves substantial improvements in reasoning completeness, the capacity to recover the entire evidence path without gaps. These results reveal that our approach effectively bridges the gap between retrieving partial context and enabling fully grounded reasoning. Resources are available at https://github.com/kwunhang/CatRAG.

LGAug 14, 2023
Search to Fine-tune Pre-trained Graph Neural Networks for Graph-level Tasks

Zhili Wang, Shimin Di, Lei Chen et al.

Recently, graph neural networks (GNNs) have shown its unprecedented success in many graph-related tasks. However, GNNs face the label scarcity issue as other neural networks do. Thus, recent efforts try to pre-train GNNs on a large-scale unlabeled graph and adapt the knowledge from the unlabeled graph to the target downstream task. The adaptation is generally achieved by fine-tuning the pre-trained GNNs with a limited number of labeled data. Despite the importance of fine-tuning, current GNNs pre-training works often ignore designing a good fine-tuning strategy to better leverage transferred knowledge and improve the performance on downstream tasks. Only few works start to investigate a better fine-tuning strategy for pre-trained GNNs. But their designs either have strong assumptions or overlook the data-aware issue for various downstream datasets. Therefore, we aim to design a better fine-tuning strategy for pre-trained GNNs to improve the model performance in this paper. Given a pre-trained GNN, we propose to search to fine-tune pre-trained graph neural networks for graph-level tasks (S2PGNN), which adaptively design a suitable fine-tuning framework for the given labeled data on the downstream task. To ensure the improvement brought by searching fine-tuning strategy, we carefully summarize a proper search space of fine-tuning framework that is suitable for GNNs. The empirical studies show that S2PGNN can be implemented on the top of 10 famous pre-trained GNNs and consistently improve their performance. Besides, S2PGNN achieves better performance than existing fine-tuning strategies within and outside the GNN area. Our code is publicly available at \url{https://anonymous.4open.science/r/code_icde2024-A9CB/}.

LGJul 25, 2024
Learn while Unlearn: An Iterative Unlearning Framework for Generative Language Models

Haoyu Tang, Ye Liu, Xi Zhao et al.

Recent advances in machine learning, particularly in Natural Language Processing (NLP), have produced powerful models trained on vast datasets. However, these models risk leaking sensitive information, raising privacy concerns. In response, regulatory measures such as the European Union's General Data Protection Regulation (GDPR) have driven increasing interest in Machine Unlearning techniques, which enable models to selectively forget specific data entries. Early unlearning approaches primarily relied on pre-processing methods, while more recent research has shifted towards training-based solutions. Despite their effectiveness, a key limitation persists: most methods require access to original training data, which is often unavailable. Additionally, directly applying unlearning techniques bears the cost of undermining the model's expressive capabilities. To address these challenges, we introduce the Iterative Contrastive Unlearning (ICU) framework, which consists of three core components: A Knowledge Unlearning Induction module designed to target specific knowledge for removal using an unlearning loss; A Contrastive Learning Enhancement module to preserve the model's expressive capabilities against the pure unlearning goal; And an Iterative Unlearning Refinement module that dynamically adjusts the unlearning process through ongoing evaluation and updates. Experimental results demonstrate the efficacy of our ICU method in unlearning sensitive information while maintaining the model's overall performance, offering a promising solution for privacy-conscious machine learning applications.

LGMar 21, 2024Code
Deep Learning for Trajectory Data Management and Mining: A Survey and Beyond

Wei Chen, Yuxuan Liang, Yuanshao Zhu et al.

Trajectory computing is a pivotal domain encompassing trajectory data management and mining, garnering widespread attention due to its crucial role in various practical applications such as location services, urban traffic, and public safety. Traditional methods, focusing on simplistic spatio-temporal features, face challenges of complex calculations, limited scalability, and inadequate adaptability to real-world complexities. In this paper, we present a comprehensive review of the development and recent advances in deep learning for trajectory computing (DL4Traj). We first define trajectory data and provide a brief overview of widely-used deep learning models. Systematically, we explore deep learning applications in trajectory management (pre-processing, storage, analysis, and visualization) and mining (trajectory-related forecasting, trajectory-related recommendation, trajectory classification, travel time estimation, anomaly detection, and mobility generation). Notably, we encapsulate recent advancements in Large Language Models (LLMs) that hold the potential to augment trajectory computing. Additionally, we summarize application scenarios, public datasets, and toolkits. Finally, we outline current challenges in DL4Traj research and propose future directions. Relevant papers and open-source resources have been collated and are continuously updated at: \href{https://github.com/yoshall/Awesome-Trajectory-Computing}{DL4Traj Repo}.

IRApr 17
Intent Propagation Contrastive Collaborative Filtering

Haojie Li, Junwei Du, Guanfeng Liu et al.

Disentanglement techniques used in collaborative filtering uncover interaction intents between nodes, improving the interpretability of node representations and enhancing recommendation performance. However, existing disentanglement methods still face two problems. First, they focus on local structural features derived from direct node interactions and overlook the comprehensive graph structure, which limits disentanglement accuracy. Second, the disentanglement process depends on backpropagation signals derived from recommendation tasks and lacks direct supervision, which may lead to biases and overfitting. To address these issues, we propose the Intent Propagation Contrastive Collaborative Filtering (IPCCF) algorithm. Specifically, we design a double helix message propagation framework to more effectively extract the deep semantic information of nodes, thereby improving the model's understanding of interactions between nodes. We also develop an intent message propagation method that incorporates graph structure information into the disentanglement process, thereby expanding the consideration scope of disentanglement. In addition, contrastive learning techniques are employed to align node representations derived from structure and intents, providing direct supervision for the disentanglement process, mitigating biases, and enhancing the model's robustness to overfitting. Experiments on three real data graphs illustrate the superiority of the proposed approach.

CLJul 12, 2024
Detect, Investigate, Judge and Determine: A Knowledge-guided Framework for Few-shot Fake News Detection

Ye Liu, Jiajun Zhu, Xukai Liu et al.

Few-Shot Fake News Detection (FS-FND) aims to distinguish inaccurate news from real ones in extremely low-resource scenarios. This task has garnered increased attention due to the widespread dissemination and harmful impact of fake news on social media. Large Language Models (LLMs) have demonstrated competitive performance with the help of their rich prior knowledge and excellent in-context learning abilities. However, existing methods face significant limitations, such as the Understanding Ambiguity and Information Scarcity, which significantly undermine the potential of LLMs. To address these shortcomings, we propose a Dual-perspective Knowledge-guided Fake News Detection (DKFND) model, designed to enhance LLMs from both inside and outside perspectives. Specifically, DKFND first identifies the knowledge concepts of each news article through a Detection Module. Subsequently, DKFND creatively designs an Investigation Module to retrieve inside and outside valuable information concerning to the current news, followed by another Judge Module to evaluate the relevance and confidence of them. Finally, a Determination Module further derives two respective predictions and obtain the final result. Extensive experiments on two public datasets show the efficacy of our proposed method, particularly in low-resource settings.

NISep 3, 2024
Three Pillars Towards Next-Generation Routing System

Lei Li, Mengxuan Zhang, Zizhuo Xu et al.

The routing results are playing an increasingly important role in transportation efficiency, but they could generate traffic congestion unintentionally. This is because the traffic condition and routing system are disconnected components in the current routing paradigm. In this paper, we propose a next-generation routing paradigm that could reduce traffic congestion by considering the influence of the routing results in real-time. Specifically, we regard the routing results as the root cause of the future traffic flow, which at the same time is identified as the root cause of traffic conditions. To implement such a system, we identify three essential components: 1) the traffic condition simulation that establishes the relation between traffic flow and traffic condition with guaranteed accuracy; 2) the future route management that supports efficient simulation with dynamic route update; 3) the global routing optimization that improves the overall transportation system efficiency. Preliminary design and experimental results will be presented, and the corresponding challenges and research directions will also be discussed.

LGMay 3, 2024Code
A Survey on Contribution Evaluation in Vertical Federated Learning

Yue Cui, Chung-ju Huang, Yuzhu Zhang et al.

Vertical Federated Learning (VFL) has emerged as a critical approach in machine learning to address privacy concerns associated with centralized data storage and processing. VFL facilitates collaboration among multiple entities with distinct feature sets on the same user population, enabling the joint training of predictive models without direct data sharing. A key aspect of VFL is the fair and accurate evaluation of each entity's contribution to the learning process. This is crucial for maintaining trust among participating entities, ensuring equitable resource sharing, and fostering a sustainable collaboration framework. This paper provides a thorough review of contribution evaluation in VFL. We categorize the vast array of contribution evaluation techniques along the VFL lifecycle, granularity of evaluation, privacy considerations, and core computational methods. We also explore various tasks in VFL that involving contribution evaluation and analyze their required evaluation properties and relation to the VFL lifecycle phases. Finally, we present a vision for the future challenges of contribution evaluation in VFL. By providing a structured analysis of the current landscape and potential advancements, this paper aims to guide researchers and practitioners in the design and implementation of more effective, efficient, and privacy-centric VFL solutions. Relevant literature and open-source resources have been compiled and are being continuously updated at the GitHub repository: \url{https://github.com/cuiyuebing/VFL_CE}.

LGAug 13, 2024
Proficient Graph Neural Network Design by Accumulating Knowledge on Large Language Models

Jialiang Wang, Hanmo Liu, Shimin Di et al.

High-level automation is increasingly critical in AI, driven by rapid advances in large language models (LLMs) and AI agents. However, LLMs, despite their general reasoning power, struggle significantly in specialized, data-sensitive tasks such as designing Graph Neural Networks (GNNs). This difficulty arises from (1) the inherent knowledge gaps in modeling the intricate, varying relationships between graph properties and suitable architectures and (2) the external noise from misleading descriptive inputs, often resulting in generic or even misleading model suggestions. Achieving proficiency in designing data-aware models -- defined as the meta-level capability to systematically accumulate, interpret, and apply data-specific design knowledge -- remains challenging for existing automated approaches, due to their inefficient construction and application of meta-knowledge. To achieve meta-level proficiency, we propose DesiGNN, a knowledge-centered framework that systematically converts past model design experience into structured, fine-grained knowledge priors well-suited for meta-learning with LLMs. To account for the inherent variability and external noise, DesiGNN aligns empirical property filtering from extensive benchmarks with adaptive elicitation of literature insights via LLMs. By constructing a solid meta-knowledge between unseen graph understanding and known effective architecture patterns, DesiGNN can deliver top-5.77% initial model proposals for unseen datasets within seconds and achieve consistently superior performance with minimal search cost compared to baselines.

IRJun 26, 2025Code
EraRAG: Efficient and Incremental Retrieval Augmented Generation for Growing Corpora

Fangyuan Zhang, Zhengjun Huang, Yingli Zhou et al.

Graph-based Retrieval-Augmented Generation (Graph-RAG) enhances large language models (LLMs) by structuring retrieval over an external corpus. However, existing approaches typically assume a static corpus, requiring expensive full-graph reconstruction whenever new documents arrive, limiting their scalability in dynamic, evolving environments. To address these limitations, we introduce EraRAG, a novel multi-layered Graph-RAG framework that supports efficient and scalable dynamic updates. Our method leverages hyperplane-based Locality-Sensitive Hashing (LSH) to partition and organize the original corpus into hierarchical graph structures, enabling efficient and localized insertions of new data without disrupting the existing topology. The design eliminates the need for retraining or costly recomputation while preserving high retrieval accuracy and low latency. Experiments on large-scale benchmarks demonstrate that EraRag achieves up to an order of magnitude reduction in update time and token consumption compared to existing Graph-RAG systems, while providing superior accuracy performance. This work offers a practical path forward for RAG systems that must operate over continually growing corpora, bridging the gap between retrieval efficiency and adaptability. Our code and data are available at https://github.com/EverM0re/EraRAG-Official.

LGFeb 24
UrbanFM: Scaling Urban Spatio-Temporal Foundation Models

Wei Chen, Yuqian Wu, Junle Chen et al.

Urban systems, as dynamic complex systems, continuously generate spatio-temporal data streams that encode the fundamental laws of human mobility and city evolution. While AI for Science has witnessed the transformative power of foundation models in disciplines like genomics and meteorology, urban computing remains fragmented due to "scenario-specific" models, which are overfitted to specific regions or tasks, hindering their generalizability. To bridge this gap and advance spatio-temporal foundation models for urban systems, we adopt scaling as the central perspective and systematically investigate two key questions: what to scale and how to scale. Grounded in first-principles analysis, we identify three critical dimensions: heterogeneity, correlation, and dynamics, aligning these principles with the fundamental scientific properties of urban spatio-temporal data. Specifically, to address heterogeneity through data scaling, we construct WorldST. This billion-scale corpus standardizes diverse physical signals, such as traffic flow and speed, from over 100 global cities into a unified data format. To enable computation scaling for modeling correlations, we introduce the MiniST unit, a novel split mechanism that discretizes continuous spatio-temporal fields into learnable computational units to unify representations of grid-based and sensor-based observations. Finally, addressing dynamics via architecture scaling, we propose UrbanFM, a minimalist self-attention architecture designed with limited inductive biases to autonomously learn dynamic spatio-temporal dependencies from massive data. Furthermore, we establish EvalST, the largest-scale urban spatio-temporal benchmark to date. Extensive experiments demonstrate that UrbanFM achieves remarkable zero-shot generalization across unseen cities and tasks, marking a pivotal first step toward large-scale urban spatio-temporal foundation models.

LGFeb 22
Learning from Complexity: Exploring Dynamic Sample Pruning of Spatio-Temporal Training

Wei Chen, Junle Chen, Yuqian Wu et al.

Spatio-temporal forecasting is fundamental to intelligent systems in transportation, climate science, and urban planning. However, training deep learning models on the massive, often redundant, datasets from these domains presents a significant computational bottleneck. Existing solutions typically focus on optimizing model architectures or optimizers, while overlooking the inherent inefficiency of the training data itself. This conventional approach of iterating over the entire static dataset each epoch wastes considerable resources on easy-to-learn or repetitive samples. In this paper, we explore a novel training-efficiency techniques, namely learning from complexity with dynamic sample pruning, ST-Prune, for spatio-temporal forecasting. Through dynamic sample pruning, we aim to intelligently identify the most informative samples based on the model's real-time learning state, thereby accelerating convergence and improving training efficiency. Extensive experiments conducted on real-world spatio-temporal datasets show that ST-Prune significantly accelerates the training speed while maintaining or even improving the model performance, and it also has scalability and universality.

LGMar 22, 2024Code
Unifying Lane-Level Traffic Prediction from a Graph Structural Perspective: Benchmark and Baseline

Shuhao Li, Yue Cui, Jingyi Xu et al.

Traffic prediction has long been a focal and pivotal area in research, witnessing both significant strides from city-level to road-level predictions in recent years. With the advancement of Vehicle-to-Everything (V2X) technologies, autonomous driving, and large-scale models in the traffic domain, lane-level traffic prediction has emerged as an indispensable direction. However, further progress in this field is hindered by the absence of comprehensive and unified evaluation standards, coupled with limited public availability of data and code. In this paper, we present the first systematic classification framework for lane-level traffic prediction, offering a structured taxonomy and analysis of existing methods. We construct three representative datasets from two real-world road networks, covering both regular and irregular lane configurations, and make them publicly available to support future research. We further establishes a unified spatial topology structure and prediction task formulation, and proposes a simple yet effective baseline model, GraphMLP, based on graph structure and MLP networks. This unified framework enables consistent evaluation across datasets and modeling paradigms. We also reproduce previously unavailable code from existing studies and conduct extensive experiments to assess a range of models in terms of accuracy, efficiency, and applicability, providing the first benchmark that jointly considers predictive performance and training cost for lane-level traffic scenarios. All datasets and code are released at https://github.com/ShuhaoLii/LaneLevel-Traffic-Benchmark.

IRMar 20
GEM: A Native Graph-based Index for Multi-Vector Retrieval

Yao Tian, Zhoujin Tian, Xi Zhao et al.

In multi-vector retrieval, both queries and data are represented as sets of high-dimensional vectors, enabling finer-grained semantic matching and improving retrieval quality over single-vector approaches. However, its practical adoption is held back by the lack of effective indexing algorithms. Existing work, attempting to reuse standard single-vector indexes, often fails to preserve multi-vector semantics or remains slow. In this work, we present GEM, a native indexing framework for multi-vector representations. The core idea is to construct a proximity graph directly over vector sets, preserving their fine-grained semantics while enabling efficient navigation. First, GEM designs a set-level clustering scheme. It associates each vector set with only its most informative clusters, effectively reducing redundancy without hurting semantic coverage. Then, it builds local proximity graphs within clusters and bridges them into a globally navigable structure. To handle the non-metric nature of multi-vector similarity, GEM decouples the graph construction metric from the final relevance score and injects semantic shortcuts to guide efficient navigation toward relevant regions. At query time, GEM launches beam search from multiple entry points and prunes paths early using cluster cues. To further enhance efficiency, a quantized distance estimation technique is used for both indexing and search. Across in-domain, out-of-domain, and multi-modal benchmarks, GEM achieves up to 16x speedup over state-of-the-art methods while matching or improving accuracy.

SIFeb 7, 2024Code
Adaptive Hypergraph Network for Trust Prediction

Rongwei Xu, Guanfeng Liu, Yan Wang et al.

Trust plays an essential role in an individual's decision-making. Traditional trust prediction models rely on pairwise correlations to infer potential relationships between users. However, in the real world, interactions between users are usually complicated rather than pairwise only. Hypergraphs offer a flexible approach to modeling these complex high-order correlations (not just pairwise connections), since hypergraphs can leverage hyperedeges to link more than two nodes. However, most hypergraph-based methods are generic and cannot be well applied to the trust prediction task. In this paper, we propose an Adaptive Hypergraph Network for Trust Prediction (AHNTP), a novel approach that improves trust prediction accuracy by using higher-order correlations. AHNTP utilizes Motif-based PageRank to capture high-order social influence information. In addition, it constructs hypergroups from both node-level and structure-level attributes to incorporate complex correlation information. Furthermore, AHNTP leverages adaptive hypergraph Graph Convolutional Network (GCN) layers and multilayer perceptrons (MLPs) to generate comprehensive user embeddings, facilitating trust relationship prediction. To enhance model generalization and robustness, we introduce a novel supervised contrastive learning loss for optimization. Extensive experiments demonstrate the superiority of our model over the state-of-the-art approaches in terms of trust prediction accuracy. The source code of this work can be accessed via https://github.com/Sherry-XU1995/AHNTP.

CLJul 21, 2025Code
Reading Between the Timelines: RAG for Answering Diachronic Questions

Kwun Hang Lau, Ruiyuan Zhang, Weijie Shi et al.

While Retrieval-Augmented Generation (RAG) excels at injecting static, factual knowledge into Large Language Models (LLMs), it exhibits a critical deficit in handling longitudinal queries that require tracking entities and phenomena across time. This blind spot arises because conventional, semantically-driven retrieval methods are not equipped to gather evidence that is both topically relevant and temporally coherent for a specified duration. We address this challenge by proposing a new framework that fundamentally redesigns the RAG pipeline to infuse temporal logic. Our methodology begins by disentangling a user's query into its core subject and its temporal window. It then employs a specialized retriever that calibrates semantic matching against temporal relevance, ensuring the collection of a contiguous evidence set that spans the entire queried period. To enable rigorous evaluation of this capability, we also introduce the Analytical Diachronic Question Answering Benchmark (ADQAB), a challenging evaluation suite grounded in a hybrid corpus of real and synthetic financial news. Empirical results on ADQAB show that our approach yields substantial gains in answer accuracy, surpassing standard RAG implementations by 13% to 27%. This work provides a validated pathway toward RAG systems capable of performing the nuanced, evolutionary analysis required for complex, real-world questions. The dataset and code for this study are publicly available at https://github.com/kwunhang/TA-RAG.

CLApr 17, 2025Code
DIDS: Domain Impact-aware Data Sampling for Large Language Model Training

Weijie Shi, Jipeng Zhang, Yaguang Wu et al.

Large language models (LLMs) are commonly trained on multi-domain datasets, where domain sampling strategies significantly impact model performance due to varying domain importance across downstream tasks. Existing approaches for optimizing domain-level sampling strategies struggle with maintaining intra-domain consistency and accurately measuring domain impact. In this paper, we present Domain Impact-aware Data Sampling (DIDS). To ensure intra-domain consistency, a gradient clustering algorithm is proposed to group training data based on their learning effects, where a proxy language model and dimensionality reduction are employed to reduce computational overhead. To accurately measure domain impact, we develop a Fisher Information Matrix (FIM) guided metric that quantifies how domain-specific parameter updates affect the model's output distributions on downstream tasks, with theoretical guarantees. Furthermore, to determine optimal sampling ratios, DIDS combines both the FIM-guided domain impact assessment and loss learning trajectories that indicate domain-specific potential, while accounting for diminishing marginal returns. Extensive experiments demonstrate that DIDS achieves 3.4% higher average performance while maintaining comparable training efficiency. The code is available at https://github.com/shiweijiezero/DIDS.

CLMay 8
SpecBlock: Block-Iterative Speculative Decoding with Dynamic Tree Drafting

Weijie Shi, Qiang Xu, Fan Deng et al.

Speculative decoding accelerates LLM inference by drafting a tree of candidate continuations and verifying it in one target forward. Existing drafters fall into two camps with opposite weaknesses. Autoregressive drafters such as EAGLE-3 preserve dependence along each draft path but call the drafter once per tree depth, making drafting a non-trivial share of per-iteration latency. Parallel drafters cut drafter calls by predicting multiple future positions in one forward, but each position is predicted without seeing the others, producing paths the verifier rejects. In this paper, we propose SpecBlock, a block-iterative drafter that combines path dependence with cheap drafting. Each drafter forward produces K dependent positions and we call this a block. The draft tree grows through repeated block expansions. Two mechanisms explicitly carry path dependence to keep later draft positions accurate. Within each block, a layer-wise shift carries the previous position's hidden state into every decoder layer. Across blocks, each new block can start from any position of the previous block, inheriting its hidden state to extend the path. To spend verifier budget where acceptance is likely, a co-trained rank head replaces the fixed top-k tree by allocating per-position branching during drafting. To avoid training the drafter on prefixes it never produces at inference, a valid-prefix mask drops the loss at later positions once an earlier one is wrong. Beyond static drafting, a cost-aware bandit at deployment uses free verifier feedback to update the drafter selectively, only when the expected throughput gain exceeds the update cost. Experiments show that SpecBlock improves mean speedup by 8-13% over EAGLE-3 at 44-52% of its drafting cost, and cost-aware adaptation extends this lead to 11-19%.

AIJan 24, 2025
Top Ten Challenges Towards Agentic Neural Graph Databases

Jiaxin Bai, Zihao Wang, Yukun Zhou et al. · tsinghua

Graph databases (GDBs) like Neo4j and TigerGraph excel at handling interconnected data but lack advanced inference capabilities. Neural Graph Databases (NGDBs) address this by integrating Graph Neural Networks (GNNs) for predictive analysis and reasoning over incomplete or noisy data. However, NGDBs rely on predefined queries and lack autonomy and adaptability. This paper introduces Agentic Neural Graph Databases (Agentic NGDBs), which extend NGDBs with three core functionalities: autonomous query construction, neural query execution, and continuous learning. We identify ten key challenges in realizing Agentic NGDBs: semantic unit representation, abductive reasoning, scalable query execution, and integration with foundation models like large language models (LLMs). By addressing these challenges, Agentic NGDBs can enable intelligent, self-improving systems for modern data-driven applications, paving the way for adaptable and autonomous data management solutions.

LGNov 4, 2024
ElasTST: Towards Robust Varied-Horizon Forecasting with Elastic Time-Series Transformer

Jiawen Zhang, Shun Zheng, Xumeng Wen et al.

Numerous industrial sectors necessitate models capable of providing robust forecasts across various horizons. Despite the recent strides in crafting specific architectures for time-series forecasting and developing pre-trained universal models, a comprehensive examination of their capability in accommodating varied-horizon forecasting during inference is still lacking. This paper bridges this gap through the design and evaluation of the Elastic Time-Series Transformer (ElasTST). The ElasTST model incorporates a non-autoregressive design with placeholders and structured self-attention masks, warranting future outputs that are invariant to adjustments in inference horizons. A tunable version of rotary position embedding is also integrated into ElasTST to capture time-series-specific periods and enhance adaptability to different horizons. Additionally, ElasTST employs a multi-scale patch design, effectively integrating both fine-grained and coarse-grained information. During the training phase, ElasTST uses a horizon reweighting strategy that approximates the effect of random sampling across multiple horizons with a single fixed horizon setting. Through comprehensive experiments and comparisons with state-of-the-art time-series architectures and contemporary foundation models, we demonstrate the efficacy of ElasTST's unique design elements. Our findings position ElasTST as a robust solution for the practical necessity of varied-horizon forecasting.

LGJan 7, 2024
Few-Shot Causal Representation Learning for Out-of-Distribution Generalization on Heterogeneous Graphs

Pengfei Ding, Yan Wang, Guanfeng Liu et al.

Heterogeneous graph few-shot learning (HGFL) has been developed to address the label sparsity issue in heterogeneous graphs (HGs), which consist of various types of nodes and edges. The core concept of HGFL is to extract knowledge from rich-labeled classes in a source HG, transfer this knowledge to a target HG to facilitate learning new classes with few-labeled training data, and finally make predictions on unlabeled testing data. Existing methods typically assume that the source HG, training data, and testing data all share the same distribution. However, in practice, distribution shifts among these three types of data are inevitable due to two reasons: (1) the limited availability of the source HG that matches the target HG distribution, and (2) the unpredictable data generation mechanism of the target HG. Such distribution shifts result in ineffective knowledge transfer and poor learning performance in existing methods, thereby leading to a novel problem of out-of-distribution (OOD) generalization in HGFL. To address this challenging problem, we propose a novel Causal OOD Heterogeneous graph Few-shot learning model, namely COHF. In COHF, we first characterize distribution shifts in HGs with a structural causal model, establishing an invariance principle for OOD generalization in HGFL. Then, following this invariance principle, we propose a new variational autoencoder-based heterogeneous graph neural network to mitigate the impact of distribution shifts. Finally, by integrating this network with a novel meta-learning framework, COHF effectively transfers knowledge to the target HG to predict new classes with few-labeled data. Extensive experiments on seven real-world datasets have demonstrated the superior performance of COHF over the state-of-the-art methods.

CLApr 13
Back to Basics: Let Conversational Agents Remember with Just Retrieval and Generation

Yuqian Wu, Wei Chen, Zhengjun Huang et al.

Existing conversational memory systems rely on complex hierarchical summarization or reinforcement learning to manage long-term dialogue history, yet remain vulnerable to context dilution as conversations grow. In this work, we offer a different perspective: the primary bottleneck may lie not in memory architecture, but in the \textit{Signal Sparsity Effect} within the latent knowledge manifold. Through controlled experiments, we identify two key phenomena: \textit{Decisive Evidence Sparsity}, where relevant signals become increasingly isolated with longer sessions, leading to sharp degradation in aggregation-based methods; and \textit{Dual-Level Redundancy}, where both inter-session interference and intra-session conversational filler introduce large amounts of non-informative content, hindering effective generation. Motivated by these insights, we propose \method, a minimalist framework that brings conversational memory back to basics, relying solely on retrieval and generation via Turn Isolation Retrieval (TIR) and Query-Driven Pruning (QDP). TIR replaces global aggregation with a max-activation strategy to capture turn-level signals, while QDP removes redundant sessions and conversational filler to construct a compact, high-density evidence set. Extensive experiments on multiple benchmarks demonstrate that \method achieves robust performance across diverse settings, consistently outperforming strong baselines while maintaining high efficiency in tokens and latency, establishing a new minimalist baseline for conversational memory.

CLMay 28, 2025
Enhancing Tool Learning in Large Language Models with Hierarchical Error Checklists

Yue Cui, Liuyi Yao, Shuchang Tao et al.

Large language models (LLMs) have significantly advanced natural language processing, particularly through the integration of external tools and APIs. However, their effectiveness is frequently hampered by parameter mis-filling during tool calling. In this paper, we propose the Hierarchical Tool Error Checklist (HiTEC) framework to systematically diagnose and mitigate tool-calling errors without relying on extensive real-world interactions. HiTEC introduces a two-tiered approach: a global error checklist that identifies common, cross-tool issues, and a local error checklist that targets tool-specific and contextual failures. Building on this structure, we propose two deployments: HiTEC-In Context Learning (HiTEC-ICL) and HiTEC-Kahneman-Tversky Optimization (HiTEC-KTO). HiTEC-ICL embeds the global checklist in the initial prompts and leverages a two-round conversational interaction to dynamically refine parameter handling, while HiTEC-KTO generates high-quality negative examples to drive fine-tuning via preference-based optimization. Extensive experiments across five public datasets demonstrate that our framework significantly improves parameter-filling accuracy and tool-calling success rates compared to baseline methods.

LGJun 2, 2025
Unraveling Spatio-Temporal Foundation Models via the Pipeline Lens: A Comprehensive Review

Yuchen Fang, Hao Miao, Yuxuan Liang et al.

Spatio-temporal deep learning models aims to utilize useful patterns in such data to support tasks like prediction. However, previous deep learning models designed for specific tasks typically require separate training for each use case, leading to increased computational and storage costs. To address this issue, spatio-temporal foundation models have emerged, offering a unified framework capable of solving multiple spatio-temporal tasks. These foundation models achieve remarkable success by learning general knowledge with spatio-temporal data or transferring the general capabilities of pre-trained language models. While previous surveys have explored spatio-temporal data and methodologies separately, they have ignored a comprehensive examination of how foundation models are designed, selected, pre-trained, and adapted. As a result, the overall pipeline for spatio-temporal foundation models remains unclear. To bridge this gap, we innovatively provide an up-to-date review of previous spatio-temporal foundation models from the pipeline perspective. The pipeline begins with an introduction to different types of spatio-temporal data, followed by details of data preprocessing and embedding techniques. The pipeline then presents a novel data property taxonomy to divide existing methods according to data sources and dependencies, providing efficient and effective model design and selection for researchers. On this basis, we further illustrate the training objectives of primitive models, as well as the adaptation techniques of transferred models. Overall, our survey provides a clear and structured pipeline to understand the connection between core elements of spatio-temporal foundation models while guiding researchers to get started quickly. Additionally, we introduce emerging opportunities such as multi-objective training in the field of spatio-temporal foundation models.

LGFeb 2, 2024
An Auction-based Marketplace for Model Trading in Federated Learning

Yue Cui, Liuyi Yao, Yaliang Li et al.

Federated learning (FL) is increasingly recognized for its efficacy in training models using locally distributed data. However, the proper valuation of shared data in this collaborative process remains insufficiently addressed. In this work, we frame FL as a marketplace of models, where clients act as both buyers and sellers, engaging in model trading. This FL market allows clients to gain monetary reward by selling their own models and improve local model performance through the purchase of others' models. We propose an auction-based solution to ensure proper pricing based on performance gain. Incentive mechanisms are designed to encourage clients to truthfully reveal their model valuations. Furthermore, we introduce a reinforcement learning (RL) framework for marketing operations, aiming to achieve maximum trading volumes under the dynamic and evolving market status. Experimental results on four datasets demonstrate that the proposed FL market can achieve high trading revenue and fair downstream task accuracy.

QMApr 7
From Exposure to Internalization: Dual-Stream Calibration for In-context Clinical Reasoning

Chuang Zhao, Hongke Zhao, Xiaofang Zhou et al.

Contextual clinical reasoning demands robust inference grounded in complex, heterogeneous clinical records. While state-of-the-art fine-tuning, in-context learning (ICL), and retrieval-augmented generation (RAG) enable knowledge exposure, they often fall short of genuine contextual internalization: dynamically adjusting a model's internal representations to the subtle nuances of individual cases at inference time. To address this, we propose Dual-Stream Calibration (DSC), a test-time training framework that transcends superficial knowledge exposure to achieve deep internalization during inference. DSC facilitates input internalization by synergistically aligning two calibration streams. Unlike passive context exposure, the Semantic Calibration Stream enforces a deliberative reflection on core evidence, internalizing semantic anchors by minimizing entropy to stabilize generative trajectories. Simultaneously, the Structural Calibration Stream assimilates latent inferential dependencies through an iterative meta-learning objective. By training on specialized support sets at test-time, this stream enables the model to bridge the gap between external evidence and internal logic, synthesizing fragmented data into a coherent response. Our approach shifts the reasoning paradigm from passive attention-based matching to an active refinement of the latent inferential space. Validated against thirteen clinical datasets, DSC demonstrates superiority across three distinct task paradigms, consistently outstripping state-of-the-art baselines ranging from training-dependent models to test-time learning frameworks.

DBApr 6
Cardinality Estimation for High Dimensional Similarity Queries with Adaptive Bucket Probing

Zhonghan Chen, Qintian Guo, Ruiyuan Zhang et al.

In this work, we address the problem of cardinality estimation for similarity search in high-dimensional spaces. Our goal is to design a framework that is lightweight, easy to construct, and capable of providing accurate estimates with satisfying online efficiency. We leverage locality-sensitive hashing (LSH) to partition the vector space while preserving distance proximity. Building on this, we adopt the principles of classical multi-probe LSH to adaptively explore neighboring buckets, accounting for distance thresholds of varying magnitudes. To improve online efficiency, we employ progressive sampling to reduce the number of distance computations and utilize asymmetric distance computation in product quantization to accelerate distance calculations in high-dimensional spaces. In addition to handling static datasets, our framework includes updating algorithm designed to efficiently support large-scale dynamic scenarios of data updates.Experiments demonstrate that our methods can accurately estimate the cardinality of similarity queries, yielding satisfying efficiency.

CLMay 22, 2025
ExeSQL: Self-Taught Text-to-SQL Models with Execution-Driven Bootstrapping for SQL Dialects

Jipeng Zhang, Haolin Yang, Kehao Miao et al.

Recent text-to-SQL models have achieved strong performance, but their effectiveness remains largely confined to SQLite due to dataset limitations. However, real-world applications require SQL generation across multiple dialects with varying syntax and specialized features, which remains a challenge for current models. The main obstacle in building a dialect-aware model lies in acquiring high-quality dialect-specific data. Data generated purely through static prompting - without validating SQLs via execution - tends to be noisy and unreliable. Moreover, the lack of real execution environments in the training loop prevents models from grounding their predictions in executable semantics, limiting generalization despite surface-level improvements from data filtering. This work introduces ExeSQL, a text-to-SQL framework with execution-driven, agentic bootstrapping. The method consists of iterative query generation, execution-based filtering (e.g., rejection sampling), and preference-based training, enabling the model to adapt to new SQL dialects through verifiable, feedback-guided learning. Experiments show that ExeSQL bridges the dialect gap in text-to-SQL, achieving average improvements of 15.2%, 10.38%, and 4.49% over GPT-4o on PostgreSQL, MySQL, and Oracle, respectively, across multiple datasets of varying difficulty.

CLMay 18, 2025
Towards DS-NER: Unveiling and Addressing Latent Noise in Distant Annotations

Yuyang Ding, Dan Qiao, Juntao Li et al.

Distantly supervised named entity recognition (DS-NER) has emerged as a cheap and convenient alternative to traditional human annotation methods, enabling the automatic generation of training data by aligning text with external resources. Despite the many efforts in noise measurement methods, few works focus on the latent noise distribution between different distant annotation methods. In this work, we explore the effectiveness and robustness of DS-NER by two aspects: (1) distant annotation techniques, which encompasses both traditional rule-based methods and the innovative large language model supervision approach, and (2) noise assessment, for which we introduce a novel framework. This framework addresses the challenges by distinctly categorizing them into the unlabeled-entity problem (UEP) and the noisy-entity problem (NEP), subsequently providing specialized solutions for each. Our proposed method achieves significant improvements on eight real-world distant supervision datasets originating from three different data sources and involving four distinct annotation techniques, confirming its superiority over current state-of-the-art methods.

AINov 17, 2025
Grounded by Experience: Generative Healthcare Prediction Augmented with Hierarchical Agentic Retrieval

Chuang Zhao, Hui Tang, Hongke Zhao et al.

Accurate healthcare prediction is critical for improving patient outcomes and reducing operational costs. Bolstered by growing reasoning capabilities, large language models (LLMs) offer a promising path to enhance healthcare predictions by drawing on their rich parametric knowledge. However, LLMs are prone to factual inaccuracies due to limitations in the reliability and coverage of their embedded knowledge. While retrieval-augmented generation (RAG) frameworks, such as GraphRAG and its variants, have been proposed to mitigate these issues by incorporating external knowledge, they face two key challenges in the healthcare scenario: (1) identifying the clinical necessity to activate the retrieval mechanism, and (2) achieving synergy between the retriever and the generator to craft contextually appropriate retrievals. To address these challenges, we propose GHAR, a \underline{g}enerative \underline{h}ierarchical \underline{a}gentic \underline{R}AG framework that simultaneously resolves when to retrieve and how to optimize the collaboration between submodules in healthcare. Specifically, for the first challenge, we design a dual-agent architecture comprising Agent-Top and Agent-Low. Agent-Top acts as the primary physician, iteratively deciding whether to rely on parametric knowledge or to initiate retrieval, while Agent-Low acts as the consulting service, summarising all task-relevant knowledge once retrieval was triggered. To tackle the second challenge, we innovatively unify the optimization of both agents within a formal Markov Decision Process, designing diverse rewards to align their shared goal of accurate prediction while preserving their distinct roles. Extensive experiments on three benchmark datasets across three popular tasks demonstrate our superiority over state-of-the-art baselines, highlighting the potential of hierarchical agentic RAG in advancing healthcare systems.

LGNov 21, 2025
CroTad: A Contrastive Reinforcement Learning Framework for Online Trajectory Anomaly Detection

Rui Xue, Dan He, Fengmei Jin et al.

Detecting trajectory anomalies is a vital task in modern Intelligent Transportation Systems (ITS), enabling the identification of unsafe, inefficient, or irregular travel behaviours. While deep learning has emerged as the dominant approach, several key challenges remain unresolved. First, sub-trajectory anomaly detection, capable of pinpointing the precise segments where anomalies occur, remains underexplored compared to whole-trajectory analysis. Second, many existing methods depend on carefully tuned thresholds, limiting their adaptability in real-world applications. Moreover, the irregular sampling of trajectory data and the presence of noise in training sets further degrade model performance, making it difficult to learn reliable representations of normal routes. To address these challenges, we propose a contrastive reinforcement learning framework for online trajectory anomaly detection, CroTad. Our method is threshold-free and robust to noisy, irregularly sampled data. By incorporating contrastive learning, CroTad learns to extract diverse normal travel patterns for different itineraries and effectively distinguish anomalous behaviours at both sub-trajectory and point levels. The detection module leverages deep reinforcement learning to perform online, real-time anomaly scoring, enabling timely and fine-grained identification of abnormal segments. Extensive experiments on two real-world datasets demonstrate the effectiveness and robustness of our framework across various evaluation scenarios.

CVSep 19, 2025
Pointing to a Llama and Call it a Camel: On the Sycophancy of Multimodal Large Language Models

Renjie Pi, Kehao Miao, Li Peihang et al.

Multimodal large language models (MLLMs) have demonstrated extraordinary capabilities in conducting conversations based on image inputs. However, we observe that MLLMs exhibit a pronounced form of visual sycophantic behavior. While similar behavior has also been noted in text-based large language models (LLMs), it becomes significantly more prominent when MLLMs process image inputs. We refer to this phenomenon as the "sycophantic modality gap." To better understand this issue, we further analyze the factors that contribute to the exacerbation of this gap. To mitigate the visual sycophantic behavior, we first experiment with naive supervised fine-tuning to help the MLLM resist misleading instructions from the user. However, we find that this approach also makes the MLLM overly resistant to corrective instructions (i.e., stubborn even if it is wrong). To alleviate this trade-off, we propose Sycophantic Reflective Tuning (SRT), which enables the MLLM to engage in reflective reasoning, allowing it to determine whether a user's instruction is misleading or corrective before drawing a conclusion. After applying SRT, we observe a significant reduction in sycophantic behavior toward misleading instructions, without resulting in excessive stubbornness when receiving corrective instructions.

AIAug 8, 2025
Zero-Shot Cellular Trajectory Map Matching

Weijie Shi, Yue Cui, Hao Chen et al.

Cellular Trajectory Map-Matching (CTMM) aims to align cellular location sequences to road networks, which is a necessary preprocessing in location-based services on web platforms like Google Maps, including navigation and route optimization. Current approaches mainly rely on ID-based features and region-specific data to learn correlations between cell towers and roads, limiting their adaptability to unexplored areas. To enable high-accuracy CTMM without additional training in target regions, Zero-shot CTMM requires to extract not only region-adaptive features, but also sequential and location uncertainty to alleviate positioning errors in cellular data. In this paper, we propose a pixel-based trajectory calibration assistant for zero-shot CTMM, which takes advantage of transferable geospatial knowledge to calibrate pixelated trajectory, and then guide the path-finding process at the road network level. To enhance knowledge sharing across similar regions, a Gaussian mixture model is incorporated into VAE, enabling the identification of scenario-adaptive experts through soft clustering. To mitigate high positioning errors, a spatial-temporal awareness module is designed to capture sequential features and location uncertainty, thereby facilitating the inference of approximate user positions. Finally, a constrained path-finding algorithm is employed to reconstruct the road ID sequence, ensuring topological validity within the road network. This process is guided by the calibrated trajectory while optimizing for the shortest feasible path, thus minimizing unnecessary detours. Extensive experiments demonstrate that our model outperforms existing methods in zero-shot CTMM by 16.8\%.

LGJul 21, 2025
Beyond Model Base Selection: Weaving Knowledge to Master Fine-grained Neural Network Design

Jialiang Wang, Hanmo Liu, Shimin Di et al.

Database systems have recently advocated for embedding machine learning (ML) capabilities, offering declarative model queries over large, managed model repositories, thereby circumventing the huge computational overhead of traditional ML-based algorithms in automated neural network model selection. Pioneering database studies aim to organize existing benchmark repositories as model bases (MB), querying them for the model records with the highest performance estimation metrics for given tasks. However, this static model selection practice overlooks the fine-grained, evolving relational dependencies between diverse task queries and model architecture variations, resulting in suboptimal matches and failing to further refine the model effectively. To fill the model refinement gap in database research, we propose M-DESIGN, a curated model knowledge base (MKB) pipeline for mastering neural network refinement by adaptively weaving prior insights about model architecture modification. First, we propose a knowledge weaving engine that reframes model refinement as an adaptive query problem over task metadata. Given a user's task query, M-DESIGN quickly matches and iteratively refines candidate models by leveraging a graph-relational knowledge schema that explicitly encodes data properties, architecture variations, and pairwise performance deltas as joinable relations. This schema supports fine-grained relational analytics over architecture tweaks and drives a predictive query planner that can detect and adapt to out-of-distribution (OOD) tasks. We instantiate M-DESIGN for graph analytics tasks, where our model knowledge base enriches existing benchmarks with structured metadata covering 3 graph tasks and 22 graph datasets, contributing data records of 67,760 graph models. Empirical results demonstrate that M-DESIGN delivers the optimal model in 26 of 33 data-task pairs within limited budgets.

CLJun 30, 2025
Semantic-guided Diverse Decoding for Large Language Model

Weijie Shi, Yue Cui, Yaguang Wu et al.

Diverse decoding of large language models is crucial for applications requiring multiple semantically distinct responses, yet existing methods primarily achieve lexical rather than semantic diversity. This limitation significantly constrains Best-of-N strategies, group-based reinforcement learning, and data synthesis. While temperature sampling and diverse beam search modify token distributions or apply n-gram penalties, they fail to ensure meaningful semantic differentiation. We introduce Semantic-guided Diverse Decoding (SemDiD), operating directly in embedding space that balances quality with diversity through three complementary mechanisms: orthogonal directional guidance, dynamic inter-group repulsion, and position-debiased probability assessment. SemDiD harmonizes these competing objectives using adaptive gain functions and constraint optimization, ensuring both quality thresholds and maximal semantic differentiation. Experiments show SemDiD consistently outperforms existing methods, improving Best-of-N coverage by 1.4-5.2% across diverse tasks and accelerating RLHF training convergence by 15% while increasing accuracy by up to 2.1%.

AIFeb 28, 2025
A Pilot Empirical Study on When and How to Use Knowledge Graphs as Retrieval Augmented Generation

Xujie Yuan, Yongxu Liu, Shimin Di et al.

The integration of Knowledge Graphs (KGs) into the Retrieval Augmented Generation (RAG) framework has attracted significant interest, with early studies showing promise in mitigating hallucinations and improving model accuracy. However, a systematic understanding and comparative analysis of the rapidly emerging KG-RAG methods are still lacking. This paper seeks to lay the foundation for systematically answering the question of when and how to use KG-RAG by analyzing their performance in various application scenarios associated with different technical configurations. After outlining the mind map using KG-RAG framework and summarizing its popular pipeline, we conduct a pilot empirical study of KG-RAG works to reimplement and evaluate 6 KG-RAG methods across 9 datasets in diverse domains and scenarios, analyzing the impact of 9 KG-RAG configurations in combination with 17 LLMs, and combining Metacognition with KG-RAG as a pilot attempt. Our results underscore the critical role of appropriate application conditions and optimal configurations of KG-RAG components.

LGFeb 23, 2024
A Bargaining-based Approach for Feature Trading in Vertical Federated Learning

Yue Cui, Liuyi Yao, Zitao Li et al.

Vertical Federated Learning (VFL) has emerged as a popular machine learning paradigm, enabling model training across the data and the task parties with different features about the same user set while preserving data privacy. In production environment, VFL usually involves one task party and one data party. Fair and economically efficient feature trading is crucial to the commercialization of VFL, where the task party is considered as the data consumer who buys the data party's features. However, current VFL feature trading practices often price the data party's data as a whole and assume transactions occur prior to the performing VFL. Neglecting the performance gains resulting from traded features may lead to underpayment and overpayment issues. In this study, we propose a bargaining-based feature trading approach in VFL to encourage economically efficient transactions. Our model incorporates performance gain-based pricing, taking into account the revenue-based optimization objectives of both parties. We analyze the proposed bargaining model under perfect and imperfect performance information settings, proving the existence of an equilibrium that optimizes the parties' objectives. Moreover, we develop performance gain estimation-based bargaining strategies for imperfect performance information scenarios and discuss potential security issues and solutions. Experiments on three real-world datasets demonstrate the effectiveness of the proposed bargaining model.

AIJan 2, 2022
Informed Multi-context Entity Alignment

Kexuan Xin, Zequn Sun, Wen Hua et al.

Entity alignment is a crucial step in integrating knowledge graphs (KGs) from multiple sources. Previous attempts at entity alignment have explored different KG structures, such as neighborhood-based and path-based contexts, to learn entity embeddings, but they are limited in capturing the multi-context features. Moreover, most approaches directly utilize the embedding similarity to determine entity alignment without considering the global interaction among entities and relations. In this work, we propose an Informed Multi-context Entity Alignment (IMEA) model to address these issues. In particular, we introduce Transformer to flexibly capture the relation, path, and neighborhood contexts, and design holistic reasoning to estimate alignment probabilities based on both embedding similarity and the relation/entity functionality. The alignment evidence obtained from holistic reasoning is further injected back into the Transformer via the proposed soft label editing to inform embedding learning. Experimental results on several benchmark datasets demonstrate the superiority of our IMEA model compared with existing state-of-the-art entity alignment methods.

IRNov 20, 2021
Edge-Enhanced Global Disentangled Graph Neural Network for Sequential Recommendation

Yunyi Li, Pengpeng Zhao, Guanfeng Liu et al.

Sequential recommendation has been a widely popular topic of recommender systems. Existing works have contributed to enhancing the prediction ability of sequential recommendation systems based on various methods, such as recurrent networks and self-attention mechanisms. However, they fail to discover and distinguish various relationships between items, which could be underlying factors which motivate user behaviors. In this paper, we propose an Edge-Enhanced Global Disentangled Graph Neural Network (EGD-GNN) model to capture the relation information between items for global item representation and local user intention learning. At the global level, we build a global-link graph over all sequences to model item relationships. Then a channel-aware disentangled learning layer is designed to decompose edge information into different channels, which can be aggregated to represent the target item from its neighbors. At the local level, we apply a variational auto-encoder framework to learn user intention over the current sequence. We evaluate our proposed method on three real-world datasets. Experimental results show that our model can get a crucial improvement over state-of-the-art baselines and is able to distinguish item features.

IRNov 20, 2021
Quaternion-Based Graph Convolution Network for Recommendation

Yaxing Fang, Pengpeng Zhao, Guanfeng Liu et al.

Graph Convolution Network (GCN) has been widely applied in recommender systems for its representation learning capability on user and item embeddings. However, GCN is vulnerable to noisy and incomplete graphs, which are common in real world, due to its recursive message propagation mechanism. In the literature, some work propose to remove the feature transformation during message propagation, but making it unable to effectively capture the graph structural features. Moreover, they model users and items in the Euclidean space, which has been demonstrated to have high distortion when modeling complex graphs, further degrading the capability to capture the graph structural features and leading to sub-optimal performance. To this end, in this paper, we propose a simple yet effective Quaternion-based Graph Convolution Network (QGCN) recommendation model. In the proposed model, we utilize the hyper-complex Quaternion space to learn user and item representations and feature transformation to improve both performance and robustness. Specifically, we first embed all users and items into the Quaternion space. Then, we introduce the quaternion embedding propagation layers with quaternion feature transformation to perform message propagation. Finally, we combine the embeddings generated at each layer with the mean pooling strategy to obtain the final embeddings for recommendation. Extensive experiments on three public benchmark datasets demonstrate that our proposed QGCN model outperforms baseline methods by a large margin.