MA AI CLSep 15, 2025

MALLM: Multi-Agent Large Language Models Framework

Jonas Becker, Lars Benedikt Kaesberg, Niklas Bauer, Jan Philip Wahle, Terry Ruas, Bela Gipp

arXiv:2509.11656v28.610 citationsh-index: 14Has CodeEMNLP

Originality Synthesis-oriented

AI Analysis

This provides a tool for researchers to systematically study multi-agent debate components, but it is incremental as it builds on existing multi-agent debate concepts with enhanced configurability.

The paper tackles the problem of limited configurability and evaluation in multi-agent debate frameworks by introducing MALLM, an open-source framework that enables systematic analysis of over 144 unique configurations, including agent personas and decision protocols, to facilitate understanding of debate components.

Multi-agent debate (MAD) has demonstrated the ability to augment collective intelligence by scaling test-time compute and leveraging expertise. Current frameworks for multi-agent debate are often designed towards tool use, lack integrated evaluation, or provide limited configurability of agent personas, response generators, discussion paradigms, and decision protocols. We introduce MALLM (Multi-Agent Large Language Models), an open-source framework that enables systematic analysis of MAD components. MALLM offers more than 144 unique configurations of MAD, including (1) agent personas (e.g., Expert, Personality), (2) response generators (e.g., Critical, Reasoning), (3) discussion paradigms (e.g., Memory, Relay), and (4) decision protocols (e.g., Voting, Consensus). MALLM uses simple configuration files to define a debate. Furthermore, MALLM can load any textual Hugging Face dataset (e.g., MMLU-Pro, WinoGrande) and provides an evaluation pipeline for easy comparison of MAD configurations. MALLM enables researchers to systematically configure, run, and evaluate debates for their problems, facilitating the understanding of the components and their interplay.

View on arXiv PDF

Similar