Chaorui Zhang

h-index13

6papers

940citations

Novelty42%

AI Score34

Ranked #110,968 of 194,257 authors (top 57%)#1,152 in IR (top 53%)

6 Papers

6.4LGSep 1, 2024Code

Towards Faster Graph Partitioning via Pre-training and Inductive Inference

Meng Qin, Chaorui Zhang, Yu Gao et al.

Graph partitioning (GP) is a classic problem that divides the node set of a graph into densely-connected blocks. Following the IEEE HPEC Graph Challenge and recent advances in pre-training techniques (e.g., large-language models), we propose PR-GPT (Pre-trained & Refined Graph ParTitioning) based on a novel pre-training & refinement paradigm. We first conduct the offline pre-training of a deep graph learning (DGL) model on small synthetic graphs with various topology properties. By using the inductive inference of DGL, one can directly generalize the pre-trained model (with frozen model parameters) to large graphs and derive feasible GP results. We also use the derived partition as a good initialization of an efficient GP method (e.g., InfoMap) to further refine the quality of partitioning. In this setting, the online generalization and refinement of PR-GPT can not only benefit from the transfer ability regarding quality but also ensure high inference efficiency without re-training. Based on a mechanism of reducing the scale of a graph to be processed by the refinement method, PR-GPT also has the potential to support streaming GP. Experiments on the Graph Challenge benchmark demonstrate that PR-GPT can ensure faster GP on large-scale graphs without significant quality degradation, compared with running a refinement method from scratch. We will make our code public at https://github.com/KuroginQin/PRGPT.

1.2SISep 29, 2022

Trading off Quality for Efficiency of Community Detection: An Inductive Method across Graphs

Meng Qin, Chaorui Zhang, Bo Bai et al.

Many network applications can be formulated as NP-hard combinatorial optimization problems of community detection (CD). Due to the NP-hardness, to balance the CD quality and efficiency remains a challenge. Most existing CD methods are transductive, which are independently optimized only for the CD on a single graph. Some of these methods use advanced machine learning techniques to obtain high-quality CD results but usually have high complexity. Other approaches use fast heuristic approximation to ensure low runtime but may suffer from quality degradation. In contrast to these transductive methods, we propose an alternative inductive community detection (ICD) method across graphs of a system or scenario to alleviate the NP-hard challenge. ICD first conducts the offline training of an adversarial dual GNN on historical graphs to capture key properties of the system. The trained model is then directly generalized to new unseen graphs for online CD without additional optimization, where a better trade-off between quality and efficiency can be achieved. ICD can also capture the permutation invariant community labels in the offline training and tackle the online CD on new graphs with non-fixed number of nodes and communities. Experiments on a set of benchmarks demonstrate that ICD can achieve a significant trade-off between quality and efficiency over various baselines.

2.7CLJun 16

ChLogic: Evaluating Robustness of Logical Reasoning in Chinese Expressions

Peixian Zhou, Yuxu Chen, Chaorui Zhang et al.

Large language models perform increasingly well on standardized logical reasoning benchmarks, but whether this ability remains robust beyond English is unclear. We introduce ChLogic, an English--Chinese aligned benchmark that tests whether models preserve logical reasoning performance when the same latent logical structure is expressed in English and diverse Chinese surface realizations. Built from formal logical templates, the benchmark contains three data sets: (i) the General aligned set, derived from 60 General Propositions across nine template families; (ii) the Difficult aligned set, derived from 40 Difficult Problems; and (iii) the Chinese-only set, covering 15 language-specific phenomenon types. Each aligned item pairs one English reference expression with five Chinese realizations. Experiments on Qwen3, Ministral, and GLM models reveal a persistent English--Chinese performance gap. Back-translation from standard Chinese into English often improves performance on the General aligned set, but produces mixed effects on the Difficult aligned set, where Qwen3-32B and GLM-5.1 perform worse after translation. These results indicate that Chinese surface realization, translation artifacts, and model-specific behavior jointly affect multilingual logical reasoning. Overall, ChLogic provides a useful stress test for the robustness of multilingual reasoning.

6.3IRAug 12, 2025

DB3 Team's Solution For Meta KDD Cup' 25

Yikuan Xia, Jiazun Chen, Yirui Zhan et al.

This paper presents the db3 team's winning solution for the Meta CRAG-MM Challenge 2025 at KDD Cup'25. Addressing the challenge's unique multi-modal, multi-turn question answering benchmark (CRAG-MM), we developed a comprehensive framework that integrates tailored retrieval pipelines for different tasks with a unified LLM-tuning approach for hallucination control. Our solution features (1) domain-specific retrieval pipelines handling image-indexed knowledge graphs, web sources, and multi-turn conversations; and (2) advanced refusal training using SFT, DPO, and RL. The system achieved 2nd place in Task 1, 2nd place in Task 2, and 1st place in Task 3, securing the grand prize for excellence in ego-centric queries through superior handling of first-person perspective challenges.

3.6IRMar 2, 2025

ER-RAG: Enhance RAG with ER-Based Unified Modeling of Heterogeneous Data Sources

Yikuan Xia, Jiazun Chen, Yirui Zhan et al.

Large language models (LLMs) excel in question-answering (QA) tasks, and retrieval-augmented generation (RAG) enhances their precision by incorporating external evidence from diverse sources like web pages, databases, and knowledge graphs. However, current RAG methods rely on agent-specific strategies for individual data sources, posing challenges low-resource or black-box environments and complicates operations when evidence is fragmented across sources. To address these limitations, we propose ER-RAG, a framework that unifies evidence integration across heterogeneous data sources using the Entity-Relationship (ER) model. ER-RAG standardizes entity retrieval and relationship querying through ER-based APIs with GET and JOIN operations. It employs a two-stage generation process: first, a preference optimization module selects optimal sources; second, another module constructs API chains based on source schemas. This unified approach allows efficient fine-tuning and seamless integration across diverse data sources. ER-RAG demonstrated its effectiveness by winning all three tracks of the 2024 KDDCup CRAG Challenge, achieving performance on par with commercial RAG pipelines using an 8B LLM backbone. It outperformed hybrid competitors by 3.1% in LLM score and accelerated retrieval by 5.5X.

1.1CVJun 7, 2016

Enhanced high dynamic range 3D shape measurement based on generalized phase-shifting algorithm

Minmin Wang, Guangliang Du, Canlin Zhou et al.

It is a challenge for Phase Measurement Profilometry (PMP) to measure objects with a large range of reflectivity variation across the surface. Saturated or dark pixels in the deformed fringe patterns captured by the camera will lead to phase fluctuations and errors. Jiang et al. proposed a high dynamic range real-time 3D shape measurement method without changing camera exposures. Three inverted phase-shifted fringe patterns are used to complement three regular phase-shifted fringe patterns for phase retrieval when any of the regular fringe patterns are saturated. But Jiang's method still has some drawbacks: (1) The phases in saturated pixels are respectively estimated by different formulas for different cases. It is shortage of an universal formula; (2) it cannot be extended to four-step phase-shifting algorithm because inverted fringe patterns are the repetition of regular fringe patterns; (3) only three unsaturated intensity values at every pixel of fringe patterns are chosen for phase demodulation, lying idle the other unsaturated ones. We proposed a method for enhanced high dynamic range 3D shape measurement based on generalized phase-shifting algorithm, which combines the complementary technique of inverted and regular fringe patterns with generalized phase-shifting algorithm. Firstly, two sets of complementary phase-shifted fringe patterns, namely regular and inverted fringe patterns are projected and collected. Then all unsaturated intensity values at the same camera pixel from two sets of fringe patterns are selected, and employed to retrieve the phase by generalized phase-shifting algorithm. Finally, simulations and experiments are conducted to prove the validity of the proposed method. The results are analyzed and compared with Jiang's method, which demonstrate that the proposed method not only expands the scope of Jiang's method, but also improves the measurement accuracy.