AI CL MAJul 7, 2025

MARBLE: A Multi-Agent Rule-Based LLM Reasoning Engine for Accident Severity Prediction

arXiv:2507.04893v11 citationsh-index: 2

Originality Highly original

AI Analysis

This addresses the problem of accurate and interpretable accident severity prediction for transportation safety systems, representing a strong specific gain rather than an incremental improvement.

The paper tackles accident severity prediction, a difficult task due to incomplete data and severe class imbalance, by proposing MARBLE, a multi-agent rule-based LLM reasoning engine that decomposes the task across specialized agents; it achieves nearly 90% accuracy on UK and US datasets, outperforming traditional classifiers and SOTA prompt-based methods that plateau below 48%.

Accident severity prediction plays a critical role in transportation safety systems but is a persistently difficult task due to incomplete data, strong feature dependencies, and severe class imbalance in which rare but high-severity cases are underrepresented and hard to detect. Existing methods often rely on monolithic models or black box prompting, which struggle to scale in noisy, real-world settings and offer limited interpretability. To address these challenges, we propose MARBLE a multiagent rule based LLM engine that decomposes the severity prediction task across a team of specialized reasoning agents, including an interchangeable ML-backed agent. Each agent focuses on a semantic subset of features (e.g., spatial, environmental, temporal), enabling scoped reasoning and modular prompting without the risk of prompt saturation. Predictions are coordinated through either rule-based or LLM-guided consensus mechanisms that account for class rarity and confidence dynamics. The system retains structured traces of agent-level reasoning and coordination outcomes, supporting in-depth interpretability and post-hoc performance diagnostics. Across both UK and US datasets, MARBLE consistently outperforms traditional machine learning classifiers and state-of-the-art (SOTA) prompt-based reasoning methods including Chain-of-Thought (CoT), Least-to-Most (L2M), and Tree-of-Thought (ToT) achieving nearly 90% accuracy where others plateau below 48%. This performance redefines the practical ceiling for accident severity classification under real world noise and extreme class imbalance. Our results position MARBLE as a generalizable and interpretable framework for reasoning under uncertainty in safety-critical applications.

View on arXiv PDF

Similar