CLApr 28

Diagnosis, Bad Planning & Reasoning. Treatment, SCOPE -- Planning for Hybrid Querying over Clinical Trial Data

arXiv:2604.2512070.9h-index: 6
Predicted impact top 54% in CL · last 90 daysOriginality Incremental advance
AI Analysis

For researchers and practitioners working on table reasoning in specialized domains (clinical trials), this work provides a practical planner-based decomposition that reduces ambiguity and improves performance over direct prompting and heavier agentic methods.

The paper addresses the problem of reasoning over clinical trial tables where answers require implicit attribute recovery (e.g., therapy type, endpoint roles) rather than direct cell lookup. The proposed SCOPE framework, a multi-LLM planner that decomposes the task into row selection, planning, and execution, achieves improved accuracy on 1,500 hybrid reasoning questions compared to baselines like zero-shot, few-shot, chain-of-thought, TableGPT2, Blend-SQL, and EHRAgent, while offering a better accuracy-efficiency tradeoff.

We study clinical trial table reasoning, where answers are not directly stored in visible cells but must be reasoned from semantic understanding through normalization, classification, extraction, or lightweight domain reasoning. Motivated by the observation that current LLM approaches often suffer from "bad reasoning" under implicit planning assumptions, we focus on settings in which the model must recover implicit attributes such as therapy type, added agents, endpoint roles, or follow-up status from partially observed clinical-trial tables. We propose SCOPE (Structured Clinical hybrid Planning for Evidence retrieval in clinical trials), a multi-LLM planner-based framework that decomposes the task into row selection, structured planning, and execution. The planner makes the source field, reasoning rules, and output constraints explicit before answer generation, reducing ambiguity relative to direct prompting. We evaluate SCOPE on 1,500 hybrid reasoning questions over oncology clinical-trial tables against zero-shot, few-shot, chain-of-thought, TableGPT2, Blend-SQL, and EHRAgent. Results show that explicit multi-LLM planning improves accuracy for reasoning-based questions while offering a stronger accuracy-efficiency tradeoff than heavier agentic baselines. Our findings position clinical trial reasoning as a distinct table understanding problem and highlight hybrid planner-based decomposition as an effective solution

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes