AI MASep 7, 2025

PillagerBench: Benchmarking LLM-Based Agents in Competitive Minecraft Team Environments

Olivier Schipper, Yudi Zhang, Yali Du, Mykola Pechenizkiy, Meng Fang

arXiv:2509.06235v19.63 citationsh-index: 49Has Code2025 IEEE Conference on Games (CoG)

Originality Incremental advance

AI Analysis

This work addresses the need for reproducible benchmarks in competitive multi-agent AI, particularly for researchers in AI and game development, though it is incremental as it builds on existing LLM-based agent frameworks.

The paper tackles the underexplored problem of evaluating LLM-based agents in competitive multi-agent environments by introducing PillagerBench, a framework for real-time team-vs-team scenarios in Minecraft, and TactiCrafter, an LLM-based system that outperforms baselines and demonstrates adaptive learning through self-play.

LLM-based agents have shown promise in various cooperative and strategic reasoning tasks, but their effectiveness in competitive multi-agent environments remains underexplored. To address this gap, we introduce PillagerBench, a novel framework for evaluating multi-agent systems in real-time competitive team-vs-team scenarios in Minecraft. It provides an extensible API, multi-round testing, and rule-based built-in opponents for fair, reproducible comparisons. We also propose TactiCrafter, an LLM-based multi-agent system that facilitates teamwork through human-readable tactics, learns causal dependencies, and adapts to opponent strategies. Our evaluation demonstrates that TactiCrafter outperforms baseline approaches and showcases adaptive learning through self-play. Additionally, we analyze its learning process and strategic evolution over multiple game episodes. To encourage further research, we have open-sourced PillagerBench, fostering advancements in multi-agent AI for competitive environments.

View on arXiv PDF

Similar