AIAug 22, 2024

GRATR: Zero-Shot Evidence Graph Retrieval-Augmented Trustworthiness Reasoning

arXiv:2408.12333v3h-index: 4
Originality Incremental advance
AI Analysis

This addresses the problem of identifying allies and adversaries in incomplete-information games for agents, with potential real-world applications like intent analysis, though it appears incremental as it builds on existing retrieval-augmented and graph-based methods.

The paper tackles trustworthiness reasoning in multiplayer games by introducing the GRATR framework, which uses zero-shot evidence graph retrieval with LLMs to improve decision-making, achieving a 50.5% increase in reasoning accuracy and a 30.6% reduction in hallucination compared to baselines in experiments.

Trustworthiness reasoning aims to enable agents in multiplayer games with incomplete information to identify potential allies and adversaries, thereby enhancing decision-making. In this paper, we introduce the graph retrieval-augmented trustworthiness reasoning (GRATR) framework, which retrieves observable evidence from the game environment to inform decision-making by large language models (LLMs) without requiring additional training, making it a zero-shot approach. Within the GRATR framework, agents first observe the actions of other players and evaluate the resulting shifts in inter-player trust, constructing a corresponding trustworthiness graph. During decision-making, the agent performs multi-hop retrieval to evaluate trustworthiness toward a specific target, where evidence chains are retrieved from multiple trusted sources to form a comprehensive assessment. Experiments in the multiplayer game \emph{Werewolf} demonstrate that GRATR outperforms the alternatives, improving reasoning accuracy by 50.5\% and reducing hallucination by 30.6\% compared to the baseline method. Additionally, when tested on a dataset of Twitter tweets during the U.S. election period, GRATR surpasses the baseline method by 10.4\% in accuracy, highlighting its potential in real-world applications such as intent analysis.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes