Runhao Zhao

AI
h-index15
5papers
12citations
Novelty48%
AI Score48

5 Papers

AIJan 15
Panning for Gold: Expanding Domain-Specific Knowledge Graphs with General Knowledge

Runhao Zhao, Weixin Zeng, Wentao Zhang et al.

Domain-specific knowledge graphs (DKGs) are critical yet often suffer from limited coverage compared to General Knowledge Graphs (GKGs). Existing tasks to enrich DKGs rely primarily on extracting knowledge from external unstructured data or completing KGs through internal reasoning, but the scope and quality of such integration remain limited. This highlights a critical gap: little systematic exploration has been conducted on how comprehensive, high-quality GKGs can be effectively leveraged to supplement DKGs. To address this gap, we propose a new and practical task: domain-specific knowledge graph fusion (DKGF), which aims to mine and integrate relevant facts from general knowledge graphs into domain-specific knowledge graphs to enhance their completeness and utility. Unlike previous research, this new task faces two key challenges: (1) high ambiguity of domain relevance, i.e., difficulty in determining whether knowledge from a GKG is truly relevant to the target domain , and (2) cross-domain knowledge granularity misalignment, i.e., GKG facts are typically abstract and coarse-grained, whereas DKGs frequently require more contextualized, fine-grained representations aligned with particular domain scenarios. To address these, we present ExeFuse, a neuro-symbolic framework based on a novel Fact-as-Program paradigm. ExeFuse treats fusion as an executable process, utilizing neuro-symbolic execution to infer logical relevance beyond surface similarity and employing target space grounding to calibrate granularity. We construct two new datasets to establish the first standardized evaluation suite for this task. Extensive experiments demonstrate that ExeFuse effectively overcomes domain barriers to achieve superior fusion performance.

CLDec 8, 2025
NeSTR: A Neuro-Symbolic Abductive Framework for Temporal Reasoning in Large Language Models

Feng Liang, Weixin Zeng, Runhao Zhao et al.

Large Language Models (LLMs) have demonstrated remarkable performance across a wide range of natural language processing tasks. However, temporal reasoning, particularly under complex temporal constraints, remains a major challenge. To this end, existing approaches have explored symbolic methods, which encode temporal structure explicitly, and reflective mechanisms, which revise reasoning errors through multi-step inference. Nonetheless, symbolic approaches often underutilize the reasoning capabilities of LLMs, while reflective methods typically lack structured temporal representations, which can result in inconsistent or hallucinated reasoning. As a result, even when the correct temporal context is available, LLMs may still misinterpret or misapply time-related information, leading to incomplete or inaccurate answers. To address these limitations, in this work, we propose Neuro-Symbolic Temporal Reasoning (NeSTR), a novel framework that integrates structured symbolic representations with hybrid reflective reasoning to enhance the temporal sensitivity of LLM inference. NeSTR preserves explicit temporal relations through symbolic encoding, enforces logical consistency via verification, and corrects flawed inferences using abductive reflection. Extensive experiments on diverse temporal question answering benchmarks demonstrate that NeSTR achieves superior zero-shot performance and consistently improves temporal reasoning without any fine-tuning, showcasing the advantage of neuro-symbolic integration in enhancing temporal understanding in large language models.

AIFeb 13
BrowseComp-$V^3$: A Visual, Vertical, and Verifiable Benchmark for Multimodal Browsing Agents

Huanyao Zhang, Jiepeng Zhou, Bo Li et al.

Multimodal large language models (MLLMs), equipped with increasingly advanced planning and tool-use capabilities, are evolving into autonomous agents capable of performing multimodal web browsing and deep search in open-world environments. However, existing benchmarks for multimodal browsing remain limited in task complexity, evidence accessibility, and evaluation granularity, hindering comprehensive and reproducible assessments of deep search capabilities. To address these limitations, we introduce BrowseComp-$V^3$, a novel benchmark consisting of 300 carefully curated and challenging questions spanning diverse domains. The benchmark emphasizes deep, multi-level, and cross-modal multi-hop reasoning, where critical evidence is interleaved across textual and visual modalities within and across web pages. All supporting evidence is strictly required to be publicly searchable, ensuring fairness and reproducibility. Beyond final-answer accuracy, we incorporate an expert-validated, subgoal-driven process evaluation mechanism that enables fine-grained analysis of intermediate reasoning behaviors and systematic characterization of capability boundaries. In addition, we propose OmniSeeker, a unified multimodal browsing agent framework integrating diverse web search and visual perception tools. Comprehensive experiments demonstrate that even state-of-the-art models achieve only 36% accuracy on our benchmark, revealing critical bottlenecks in multimodal information integration and fine-grained perception. Our results highlight a fundamental gap between current model capabilities and robust multimodal deep search in real-world settings.

MASep 10, 2022
Cooperation and Competition: Flocking with Evolutionary Multi-Agent Reinforcement Learning

Yunxiao Guo, Xinjia Xie, Runhao Zhao et al.

Flocking is a very challenging problem in a multi-agent system; traditional flocking methods also require complete knowledge of the environment and a precise model for control. In this paper, we propose Evolutionary Multi-Agent Reinforcement Learning (EMARL) in flocking tasks, a hybrid algorithm that combines cooperation and competition with little prior knowledge. As for cooperation, we design the agents' reward for flocking tasks according to the boids model. While for competition, agents with high fitness are designed as senior agents, and those with low fitness are designed as junior, letting junior agents inherit the parameters of senior agents stochastically. To intensify competition, we also design an evolutionary selection mechanism that shows effectiveness on credit assignment in flocking tasks. Experimental results in a range of challenging and self-contrast benchmarks demonstrate that EMARL significantly outperforms the full competition or cooperation methods.

85.9CVApr 6Code
OpenWorldLib: A Unified Codebase and Definition of Advanced World Models

DataFlow Team, Bohan Zeng, Daili Hua et al.

World models have garnered significant attention as a promising research direction in artificial intelligence, yet a clear and unified definition remains lacking. In this paper, we introduce OpenWorldLib, a comprehensive and standardized inference framework for Advanced World Models. Drawing on the evolution of world models, we propose a clear definition: a world model is a model or framework centered on perception, equipped with interaction and long-term memory capabilities, for understanding and predicting the complex world. We further systematically categorize the essential capabilities of world models. Based on this definition, OpenWorldLib integrates models across different tasks within a unified framework, enabling efficient reuse and collaborative inference. Finally, we present additional reflections and analyses on potential future directions for world model research. Code link: https://github.com/OpenDCAI/OpenWorldLib