LG AIFeb 12

TabSieve: Explicit In-Table Evidence Selection for Tabular Prediction

Yongyao Wang, Ziqi Miao, Lu Yang, Haonan Jia, Wenting Yan, Chen Qian, Lijun Li

arXiv:2602.11700v12.72 citations

Originality Highly original

AI Analysis

This addresses the challenge of robust and interpretable few-shot learning in tabular data for machine learning practitioners, representing an incremental improvement with a novel method for a known bottleneck.

The paper tackles the problem of inconsistent evidence usage and noise sensitivity in tabular prediction by proposing TabSieve, a select-then-predict framework that explicitly selects informative rows as evidence, resulting in average performance gains of 2.92% on classification and 4.45% on regression over baselines.

Tabular prediction can benefit from in-table rows as few-shot evidence, yet existing tabular models typically perform instance-wise inference and LLM-based prompting is often brittle. Models do not consistently leverage relevant rows, and noisy context can degrade performance. To address this challenge, we propose TabSieve, a select-then-predict framework that makes evidence usage explicit and auditable. Given a table and a query row, TabSieve first selects a small set of informative rows as evidence and then predicts the missing target conditioned on the selected evidence. To enable this capability, we construct TabSieve-SFT-40K by synthesizing high-quality reasoning trajectories from 331 real tables using a strong teacher model with strict filtering. Furthermore, we introduce TAB-GRPO, a reinforcement learning recipe that jointly optimizes evidence selection and prediction correctness with separate rewards, and stabilizes mixed regression and classification training via dynamic task-advantage balancing. Experiments on a held-out benchmark of 75 classification and 52 regression tables show that TabSieve consistently improves performance across shot budgets, with average gains of 2.92% on classification and 4.45% on regression over the second-best baseline. Further analysis indicates that TabSieve concentrates more attention on the selected evidence, which improves robustness to noisy context.

View on arXiv PDF

Similar