DBAIFeb 15

TabTracer: Monte Carlo Tree Search for Complex Table Reasoning with Large Language Models

arXiv:2602.14089v11 citations
Originality Highly original
AI Analysis

This addresses inefficiencies in table reasoning for AI applications, offering a novel method to improve accuracy and reduce costs, though it is incremental in building on existing agent-based approaches.

The paper tackles the problem of complex table reasoning with large language models by proposing TabTracer, an agentic framework that uses Monte Carlo Tree Search for step-level verification and rollback, resulting in up to 6.7% higher accuracy and 59-84% reduced token consumption compared to state-of-the-art baselines.

Large language models (LLMs) have emerged as powerful tools for natural language table reasoning, where there are two main categories of methods. Prompt-based approaches rely on language-only inference or one-pass program generation without step-level verification. Agent-based approaches use tools in a closed loop, but verification is often local and backtracking is limited, allowing errors to propagate and increasing cost. Moreover, they rely on chain- or beam-style trajectories that are typically combinatorially redundant, leading to high token costs. In this paper, we propose TabTracer, an agentic framework that coordinates multi-step tool calls over intermediate table states, with explicit state tracking for verification and rollback. First, it enforces step-level verification with typed operations and lightweight numeric and format checks to provide reliable rewards and suppress hallucinations. Second, execution-feedback Monte Carlo Tree Search maintains a search tree of candidate table states and uses backpropagated reflection scores to guide UCB1 selection and rollback via versioned snapshots. Third, it reduces redundancy with budget-aware pruning, deduplication, and state hashing with a monotonicity gate to cut token cost. Comprehensive evaluation on TabFact, WikiTQ, and CRT datasets shows that TabTracer outperforms state-of-the-art baselines by up to 6.7% in accuracy while reducing token consumption by 59--84%.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes