LG CLJun 1, 2025

Generalizable LLM Learning of Graph Synthetic Data with Post-training Alignment

Yizhuo Zhang, Heng Wang, Shangbin Feng, Zhaoxuan Tan, Xinyun Liu, Yulia Tsvetkov

TsinghuaUW

arXiv:2506.00845v34.1h-index: 16

Originality Incremental advance

AI Analysis

This work addresses the challenge of making LLMs more adaptable to real-world graph reasoning tasks beyond synthetic benchmarks, though it remains incremental with noted limitations in compositionality and explainability.

The paper tackles the problem of improving LLMs' generalization from synthetic graph data to real-world tasks with implicit graph structures by using post-training alignment with synthetic data, achieving an average gain of 12.9% over baselines on 5 datasets.

Previous research has sought to enhance the graph reasoning capabilities of LLMs by supervised fine-tuning on synthetic graph data. While these led to specialized LLMs better at solving graph algorithm problems, we don't need LLMs for shortest path: we need generalization from synthetic graph data to real-world tasks with implicit graph structures. In this work, we propose to unlock generalizable learning of graph with post-training alignment with synthetic data. We first design solution-based and process-based rewards for synthetic graph problems: instead of rigid memorizing response patterns in direct fine-tuning, we posit that post-training alignment would help LLMs grasp the essentials underlying graph reasoning and alleviate overfitting on synthetic data. We employ post-training alignment algorithms such as GRPO and DPO, aligning both off-the-shelf LLMs and LLMs fine-tuned on synthetic graph data. We then compare them against existing settings on both in-domain synthetic tasks and out-of-domain real-world tasks with implicit graph structures such as multi-hop QA, structured planning, and more. Extensive experiments demonstrate that our post-training alignment recipe leads to statistically significant improvement on 5 datasets, with an average gain of 12.9% over baseline settings. Further analysis reveals that process-based rewards consistently outperform solution-based rewards on synthetic data but not on real-world tasks, and compositionality and explainable intermediate steps remains a critical challenge even after post-training alignment.

View on arXiv PDF

Similar