MultiVis-Agent: A Multi-Agent Framework with Logic Rules for Reliable and Comprehensive Cross-Modal Data Visualization

Jinwei Lu, Yuanfeng Song, Chen Zhang, Raymond Chi-Wing Wong

arXiv:2601.18320v1h-index: 6

Originality Highly original

AI Analysis

This addresses reliability and complexity challenges in automated visualization generation for users needing multi-modal inputs, though it appears incremental as an enhancement to existing LLM-based approaches.

The paper tackles the problem of automated visualization generation for complex multi-modal requirements by proposing MultiVis-Agent, a logic rule-enhanced multi-agent framework, which achieves 75.63% visualization score on challenging tasks, outperforming baselines (57.54-62.79%), with high task completion (99.58%) and code execution success rates (94.56%).

Real-world visualization tasks involve complex, multi-modal requirements that extend beyond simple text-to-chart generation, requiring reference images, code examples, and iterative refinement. Current systems exhibit fundamental limitations: single-modality input, one-shot generation, and rigid workflows. While LLM-based approaches show potential for these complex requirements, they introduce reliability challenges including catastrophic failures and infinite loop susceptibility. To address this gap, we propose MultiVis-Agent, a logic rule-enhanced multi-agent framework for reliable multi-modal and multi-scenario visualization generation. Our approach introduces a four-layer logic rule framework that provides mathematical guarantees for system reliability while maintaining flexibility. Unlike traditional rule-based systems, our logic rules are mathematical constraints that guide LLM reasoning rather than replacing it. We formalize the MultiVis task spanning four scenarios from basic generation to iterative refinement, and develop MultiVis-Bench, a benchmark with over 1,000 cases for multi-modal visualization evaluation. Extensive experiments demonstrate that our approach achieves 75.63% visualization score on challenging tasks, significantly outperforming baselines (57.54-62.79%), with task completion rates of 99.58% and code execution success rates of 94.56% (vs. 74.48% and 65.10% without logic rules), successfully addressing both complexity and reliability challenges in automated visualization generation.

View on arXiv PDF

Similar