TheraAgent: Self-Improving Therapeutic Agent for Precise and Comprehensive Treatment Planning
For clinical decision support, this framework improves the precision, completeness, and safety of LLM-generated treatment plans by mirroring human expert revision processes.
TheraAgent replaces one-shot LLM generation with an iterative generate-judge-refine pipeline for treatment planning, achieving state-of-the-art results on HealthBench and an 86% win rate against physicians in expert evaluations.
Formulating a treatment plan is inherently a complex reasoning and refinement task rather than a simple generation problem. However, existing large language models (LLMs) mainly rely on one-shot output without explicit verification, which may result in rough, incomplete, and potentially unsafe treatment plans. To address these limitations, we propose TheraAgent, an agentic framework that replaces one-shot generation with an iterative generate-judge-refine pipeline. By mirroring the actual reasoning process of human experts who iteratively revise treatment plans, our framework progressively transforms coarse and incomplete drafts into precise, comprehensive, and safer therapeutic regimens. To facilitate the critical judge component, we introduce TheraJudge, a treatment-specific evaluation module integrated into the inference loop to enforce clinical standards. Experiments show TheraAgent achieves state-of-the-art results on HealthBench, leading in Accuracy and Completeness. In expert evaluations, it attains an 86% win rate against physicians, with superior Targeting and Harm Control. Moreover, the highly agreement between TheraJudge and HealthBench evaluations confirms the reliability of our framework.